DEV Community: Luis A. Obando

I Stopped Vibe Coding and Started Shipping: Task-Driven Development with AI

Luis A. Obando — Fri, 20 Mar 2026 13:39:30 +0000

A year ago I wrote about how I built ExamGenius with vibe coding and Claude. The premise was simple: tell the AI what you want, it generates the code, you tweak, and in 20 hours you have a complete app that would normally take weeks. 85% time reduction. Working code. All good.

A year later, the industry has moved at breakneck speed. Models are more capable, IDEs integrate AI natively, and more teams are shipping AI-generated code to production every day. But there's a problem nobody wants to admit: we're shipping massive amounts of code that nobody truly reviewed.

It feels productive. But the result is often code that's hard to maintain, untested, poorly structured, and with zero traceability on why certain decisions were made. Vibe coding feels great until you have to debug something that neither you nor the AI remember writing.

This post is about what comes after vibe coding.

The real problem: without structure, even the best dev creates noise

Let's be honest: AI produces code that many senior devs couldn't write as fast or as clean. The problem isn't code quality — it's lack of direction.

Without clear tasks, the AI wanders. You say "fix this bug" and it refactors half the module. Without defined scope, "improve this" becomes a 15-file change nobody asked for. You lose visibility into what was done, what's left, and what broke along the way. It's not a competence problem — it's a context problem. The AI doesn't know what decisions you've already made, what patterns your codebase follows, or what the actual scope of your request is.

With ExamGenius it worked because it was a greenfield project — a single session, no legacy code, no other collaborators. But in real projects that evolve week after week, pure vibe coding doesn't scale.

And in the enterprise — where there are teams, code reviews, compliance requirements, and real production systems — it's simply not enough. You can't show up to a PR review saying "the AI generated it and it looked fine." You need to explain what was done, why, and what criteria were used to validate it.

The alternative: Task-Driven Development

The idea is simple: define the work before executing it.

Instead of opening a chat and saying "build me this," you define a task with clear scope, break it into subtasks if needed, and give the AI the precise context to execute each piece. The flow looks like this:

Spec — define what you want to achieve and why
Tasks — break the work into manageable units
Subtasks — if a task is large, break it down further
Plan — before writing code, the AI researches the codebase and writes an implementation plan in the task. You review and approve before a single line of code is touched
Execution — the AI works within the defined scope
Review — you validate before moving on

Step 4 is key and most people skip it. Asking the AI to plan before executing gives you a review checkpoint that's worth its weight in gold. The plan reflects the current state of the codebase — it's not theory, it's a concrete proposal you can approve, adjust, or reject. It's the difference between "just do it" and "tell me how you'd do it, then do it."

The developer goes back to being the architect and reviewer, not a spectator. You decide what gets done, the AI executes. And if something doesn't look right, you correct it before it propagates.

Here's what closes the loop: the AI doesn't just execute tasks — it also helps you define them. You can ask it to suggest subtasks, identify edge cases, or propose DoDs based on the project context. You provide the general direction, the AI helps you specify with enough detail, and then executes against that specification. It's a collaborative cycle where both sides contribute what they do best.

This applies to side projects and enterprise teams alike. The difference is that in the enterprise, it's not optional.

Definition of Done: the AI has to meet criteria too

This is where it gets interesting. Each task can have a set of Definition of Done (DoDs) — criteria that must be met before marking it complete.

This isn't bureaucracy. It's verifiable quality:

Tests pass
No regressions
Code follows project patterns
Translations are complete
Logging is consistent

The AI stops being a black box that "generates code" and becomes an executor that has to meet concrete criteria. If it doesn't meet them, the task isn't done.

In an enterprise context, DoDs are the bridge between "the AI generated code" and "this code is production-ready." They're the evidence that someone (human or AI) validated that the work meets the team's standards.

Backlog.md: the backlog that lives in your repo

The tool I use for this is Backlog.md — a task management system that lives directly in your repository as markdown files.

Why not Jira or Trello? Because the AI can't read your Jira board. Backlog.md integrates with the IDE via MCP (Model Context Protocol), which means the AI can:

Read pending tasks
Understand the context of what it needs to do
Update status when it's done
Check DoDs before marking something as Done

Everything is versioned in git. Every task, every decision, every status change is a commit. Full traceability — you can see exactly what was done, when, and in what order.

You don't need to leave your code to manage your work. The backlog is right there, next to your src/.

For personal projects and small teams, Backlog.md is ideal for its simplicity. In the enterprise, the concept is the same but the tool would connect to existing tooling — Jira, Linear, Azure DevOps. What matters isn't the specific tool, but that the AI has access to the backlog, can read task context, and can validate DoDs. The task-driven development pattern works the same regardless of where the tasks live.

Real-world example: Finia

Finia is a Telegram bot I'm building for personal expense tracking in Costa Rica. It parses bank notifications, categorizes expenses with AI, generates reports, and supports multiple languages. The backend is Python with SQLAlchemy, runs on Kubernetes, and uses structlog for structured logging.

All recent development was done with task-driven development. Here are three concrete examples:

Refactoring handle_message

The main message handler was a 236-line monolith — a single async def with all the logic for budget input, edit amount, rate limiting, chat agent, notifications, and tutorial. It worked, but it was impossible to test or modify without breaking something.

I created TASK-2 with 4 subtasks:

Extract budget input handler
Extract edit amount handler
Extract chat agent invocation
Extract notification sending and tutorial detection

The AI executed each subtask separately. I reviewed each step. The result: 7 focused functions and a handle_message of ~40 lines that only orchestrates. All 79 tests passed without changes.

# Before: 236 lines in a single function
async def handle_message(update, context):
    # ... everything mixed together ...

# After: clean orchestrator
async def handle_message(update, context):
    user = await get_user(update.effective_user.id)
    if user is None:
        return
    text = update.message.text.strip()

    if await _handle_budget_input(update, context, user, text):
        return
    if await _handle_edit_amount(update, context, user, text):
        return
    if not await _check_rate_limit(update, user):
        return

    response = await _run_chat_agent(update, user, text)
    await _send_expense_notifications(update, context, user, response)

i18n sweep

Finia supports Spanish and English. But after several development iterations, there were ~50 hardcoded English messages scattered across the codebase — buttons, errors, labels, prompts.

I created TASK-4 and the AI made three passes:

Error messages and buttons in callbacks.py, commands.py, chat.py
Usage blocks (/expense, /sinpe), budget displays, category labels
Change the pre-registration default language to Spanish (most users are Costa Rican)

The DoD was clear: all user-facing messages translated, admin commands intentionally excluded. Each pass was reviewed before continuing.

Interaction tracking

I needed metrics for a CloudWatch dashboard. I created TASK-5: add command_executed to every command handler and callback_executed to the callback handler.

The pattern had to be consistent:

logger.info("command_executed", service="finia-bot", command="help", user_id=user.id)
logger.info("callback_executed", service="finia-bot", callback="chcat", user_id=user.id)

During review, I caught that /start was using telegram_id instead of user_id (because the user doesn't exist yet). We fixed it to include both. I also found that commands.py hadn't been staged in the commit — the AI had edited it but didn't include it. Without step-by-step review, that would have shipped to production incomplete.

Vibe coding vs Task-driven: the comparison

	Vibe Coding	Task-Driven
Start	"Refactor this for me"	TASK-2: Refactor handle_message, 4 subtasks defined
Scope	The AI decides what to change	You define what gets touched and what doesn't
Validation	"Looks good"	DoDs: tests pass, consistent pattern, no regressions
Traceability	Chat history (if it wasn't deleted)	Tasks in git with status, plan, and notes
Rollback	Ctrl+Z and pray	Each subtask is an atomic commit
Enterprise-ready	No	Yes

ExamGenius was chapter 1 — vibe coding works for prototypes and hackathons. Finia is chapter 2 — task-driven for projects that need to be maintained. The enterprise is chapter 3 — where this approach isn't a nice-to-have but a requirement.

The control is yours

This isn't about stopping using AI. It's about using it well.

Task-driven development isn't bureaucracy. It's giving the AI (and yourself) the structure to ship with confidence. You define the work, set the criteria, review the execution. The AI is incredibly capable — but it needs direction, just like any team member.

The developer who defines work well, ships better software. The industry has moved forward — it's time our processes caught up.

Finia is a project I'm working on to simplify expense tracking in Costa Rica. If you're interested in trying it out or learning more, reach out — it's in beta and I'm always looking for feedback from early adopters.

Dejé de vibecodeear y empecé a entregar: Task-Driven Development con AI

Luis A. Obando — Fri, 20 Mar 2026 13:34:07 +0000

Hace un año escribí sobre cómo construí ExamGenius con vibe coding y Claude. La premisa era simple: le dices al AI qué quieres, él genera el código, tú ajustas, y en 20 horas tienes una app completa que normalmente tomaría semanas. Reducción del 85% en tiempo. Código funcional. Todo bien.

Un año después, la industria avanzó a pasos agigantados. Los modelos son más capaces, los IDEs integran AI de forma nativa, y cada vez más equipos están shipeando código generado por IA a producción. Pero hay un problema que nadie quiere admitir: estamos generando cantidades masivas de código que nadie revisó de verdad.

Se siente productivo. Pero el resultado muchas veces es código difícil de mantener, sin tests, sin estructura clara, y sin trazabilidad de por qué se tomaron ciertas decisiones. Vibe coding se siente bien hasta que tienes que debuggear algo que ni tú ni el AI recuerdan por qué existe.

Este post es sobre lo que viene después del vibe coding.

El problema real: sin estructura, hasta el mejor dev genera ruido

Seamos honestos: el AI produce código que muchos senior devs no podrían escribir tan rápido ni tan limpio. El problema no es la calidad del código — es la falta de dirección.

Sin tareas claras, el AI divaga. Le dices "arregla este bug" y te refactoriza medio módulo. Sin scope definido, un "mejora esto" se convierte en un cambio de 15 archivos que nadie pidió. El dev pierde visibilidad de qué se hizo, qué falta, y qué se rompió en el camino. No es un problema de competencia — es un problema de contexto. El AI no sabe qué decisiones ya tomaste, qué patrones sigue tu codebase, ni cuál es el scope real de lo que necesitas.

Con ExamGenius funcionó porque era un proyecto greenfield — una sola sesión, sin código legacy, sin otros colaboradores. Pero en proyectos reales que evolucionan semana a semana, el vibe coding puro no escala.

Y en el enterprise — donde hay equipos, code reviews, compliance, y producción real — es simplemente insuficiente. No puedes llegar a un PR review diciendo "el AI lo generó y se veía bien". Necesitas poder explicar qué se hizo, por qué, y bajo qué criterios se validó.

La alternativa: Task-Driven Development

La idea es simple: definir el trabajo antes de ejecutarlo.

En vez de abrir el chat y decir "hazme esto", defines una tarea con scope claro, la descompones en subtasks si es necesario, y le das al AI el contexto preciso para ejecutar cada pieza. El flujo se ve así:

Spec — defines qué quieres lograr y por qué
Tasks — descompones el trabajo en unidades manejables
Subtasks — si una tarea es grande, la partes más
Plan — antes de codear, el AI investiga el codebase y escribe un plan de implementación en la tarea. Tú lo revisas y apruebas antes de que toque una línea de código
Ejecución — el AI trabaja dentro del scope definido
Revisión — tú validas antes de avanzar

El paso 4 es clave y muchos se lo saltan. Pedirle al AI que planifique antes de ejecutar te da un checkpoint de revisión que vale oro. El plan refleja el estado actual del codebase — no es teoría, es una propuesta concreta que puedes aprobar, ajustar, o rechazar. Es la diferencia entre "hazlo" y "dime cómo lo harías, y luego lo haces".

El programador vuelve a ser arquitecto y revisor, no espectador. Tú decides qué se hace, el AI ejecuta. Y si algo no te convence, lo corriges antes de que se propague.

Y acá hay algo que cierra el círculo: el AI no solo ejecuta las tareas — también te ayuda a definirlas. Puedes pedirle que te sugiera subtasks, que identifique edge cases, o que proponga DoDs basados en el contexto del proyecto. Tú das la dirección general, el AI te ayuda a especificar con suficiente detalle, y luego ejecuta contra esa especificación. Es un ciclo colaborativo donde ambos aportan lo que mejor hacen.

Esto aplica tanto para side projects como para equipos enterprise. La diferencia es que en enterprise no es opcional.

Definition of Done: el AI también tiene que cumplir criterios

Acá es donde la cosa se pone interesante. Cada tarea puede tener un conjunto de Definition of Done (DoDs) — criterios que deben cumplirse antes de marcarla como completa.

Esto no es burocracia. Es calidad verificable:

Los tests pasan
No hay regresiones
El código sigue los patrones del proyecto
Las traducciones están completas
El logging es consistente

El AI deja de ser una caja negra que "genera código" y se convierte en un ejecutor que tiene que cumplir criterios concretos. Si no los cumple, la tarea no está terminada.

En un contexto enterprise, los DoDs son el puente entre "el AI generó código" y "este código está listo para producción". Son la evidencia de que alguien (humano o AI) validó que el trabajo cumple con los estándares del equipo.

Backlog.md: el backlog que vive en tu repo

La herramienta que uso para esto es Backlog.md — un sistema de gestión de tareas que vive directamente en tu repositorio como archivos markdown.

¿Por qué no Jira o Trello? Porque el AI no puede leer tu board de Jira. Backlog.md se integra con el IDE vía MCP (Model Context Protocol), lo que significa que el AI puede:

Leer las tareas pendientes
Entender el contexto de lo que tiene que hacer
Actualizar el estado cuando termina
Consultar los DoDs antes de marcar algo como Done

Todo queda versionado en git. Cada tarea, cada decisión, cada cambio de estado es un commit. La trazabilidad es total — puedes ver exactamente qué se hizo, cuándo, y en qué orden.

No necesitas salir del código para gestionar tu trabajo. El backlog está ahí, al lado de tu src/.

Para proyectos personales y equipos pequeños, Backlog.md es ideal por su simplicidad. En el enterprise, el concepto es el mismo pero la herramienta se conectaría al tooling existente — Jira, Linear, Azure DevOps. Lo importante no es el tool específico, sino que el AI tenga acceso al backlog, pueda leer el contexto de las tareas, y pueda validar los DoDs. El patrón de task-driven development funciona igual independientemente de dónde vivan las tareas.

Caso real: Finia

Finia es un bot de Telegram que estoy construyendo para tracking de gastos personales en Costa Rica. Parsea notificaciones bancarias, categoriza gastos con AI, genera reportes, y soporta múltiples idiomas. El backend es Python con SQLAlchemy, corre en Kubernetes, y usa structlog para logging estructurado.

Todo el desarrollo reciente lo hice con task-driven development. Acá van tres ejemplos concretos:

Refactor de handle_message

El handler principal de mensajes era un monolito de 236 líneas — un solo async def con toda la lógica de budget input, edit amount, rate limiting, chat agent, notificaciones, y tutorial. Funcionaba, pero era imposible de testear o modificar sin romper algo.

Creé TASK-2 con 4 subtasks:

Extraer budget input handler
Extraer edit amount handler
Extraer chat agent invocation
Extraer notification sending y tutorial detection

El AI ejecutó cada subtask por separado. Yo revisé cada paso. El resultado: 7 funciones enfocadas y un handle_message de ~40 líneas que solo orquesta. Los 79 tests pasaron sin cambios.

# Antes: 236 líneas en una sola función
async def handle_message(update, context):
    # ... todo mezclado ...

# Después: orquestador limpio
async def handle_message(update, context):
    user = await get_user(update.effective_user.id)
    if user is None:
        return
    text = update.message.text.strip()

    if await _handle_budget_input(update, context, user, text):
        return
    if await _handle_edit_amount(update, context, user, text):
        return
    if not await _check_rate_limit(update, user):
        return

    response = await _run_chat_agent(update, user, text)
    await _send_expense_notifications(update, context, user, response)

i18n sweep

Finia soporta español e inglés. Pero después de varias iteraciones de desarrollo, había ~50 mensajes hardcodeados en inglés dispersos por todo el código — botones, errores, labels, prompts.

Creé TASK-4 y el AI hizo tres pasadas:

Mensajes de error y botones en callbacks.py, commands.py, chat.py
Bloques de uso (/expense, /sinpe), displays de budget, labels de categorías
Cambiar el idioma default pre-registro a español (la mayoría de usuarios son ticos)

El DoD era claro: todos los mensajes user-facing traducidos, admin commands excluidos intencionalmente. Cada pasada fue revisada antes de continuar.

Interaction tracking

Necesitaba métricas para un dashboard de CloudWatch. Creé TASK-5: agregar command_executed en cada command handler y callback_executed en el callback handler.

El patrón tenía que ser consistente:

logger.info("command_executed", service="finia-bot", command="help", user_id=user.id)
logger.info("callback_executed", service="finia-bot", callback="chcat", user_id=user.id)

Durante la revisión, detecté que /start usaba telegram_id en vez de user_id (porque el usuario aún no existe). Lo corregimos para incluir ambos. También encontré que commands.py no se había incluido en el commit — el AI lo había editado pero no staged. Sin la revisión paso a paso, eso se habría ido a producción incompleto.

Vibe coding vs Task-driven: la comparación

	Vibe Coding	Task-Driven
Inicio	"Hazme un refactor de esto"	TASK-2: Refactor handle_message, 4 subtasks definidas
Scope	El AI decide qué cambiar	Tú defines qué se toca y qué no
Validación	"Se ve bien"	DoDs: tests pasan, patrón consistente, sin regresiones
Trazabilidad	Chat history (si no se borró)	Tasks en git con estado, plan, y notas
Rollback	Ctrl+Z y rezar	Cada subtask es un commit atómico
Enterprise-ready	No	Sí

ExamGenius fue el capítulo 1 — vibe coding funciona para prototipos y hackathons. Finia es el capítulo 2 — task-driven para proyectos que necesitan mantenerse. El enterprise es el capítulo 3 — donde este enfoque no es un nice-to-have sino un requisito.

El control es tuyo

No se trata de dejar de usar AI. Se trata de usarlo bien.

Task-driven development no es burocracia. Es darle al AI (y a ti mismo) la estructura para entregar con confianza. Defines el trabajo, estableces los criterios, revisas la ejecución. El AI es increíblemente capaz — pero necesita dirección, igual que cualquier miembro de un equipo.

El programador que define bien el trabajo, entrega mejor software. La industria avanzó — es hora de que nuestros procesos avancen también.

Finia es un proyecto en el que estoy trabajando para simplificar el tracking de gastos en Costa Rica. Si te interesa probarlo o saber más, escríbeme — está en beta y siempre busco feedback de early adopters.

From Idea to Web App: Building ExamGenius with Vibe Coding and Claude

Luis A. Obando — Tue, 01 Apr 2025 03:57:05 +0000

In the dynamic world of software development, finding more efficient ways to bring ideas to life has always been a constant pursuit. Recently, I completed a project that radically transformed my perspective on web app development: ExamGenius, an application that generates practice exams from documents uploaded by students, using generative AI.

What makes this project special isn't just its functionality, but how it was built: using a "Vibe Coding" approach with Anthropic's Claude as my development copilot. In this article, I'll share this development experience, the challenges we faced, and how generative AI is changing the software development landscape.

What is "Vibe Coding"?

Before diving into the technical details, let me explain what I mean by "Vibe Coding." It's a collaborative programming approach where a human and an AI work together, with the human providing strategic direction, requirements, and adjustments, while the AI handles much of the code generation and implementation of design patterns.

It's different from simply using a coding assistant. With Vibe Coding, you're building an entire system in collaboration with AI, discussing architecture, making design decisions, and reviewing code together, almost like pair programming with an extremely competent digital partner.

ExamGenius: The Project

ExamGenius is a web application that allows students to upload one or more PDF documents or photographs of textbooks or notes. The application uses AI to extract the text and generate a personalized practice exam that can be downloaded in PDF format.

Key requirements included:

A serverless microservices architecture on AWS
Compliance with 12-factor app principles
Modern frontend developed with Next.js
Backend based on Lambda, S3, Step Functions, and Bedrock

Technical Architecture

The solution we built consists of two main components:

Serverless Backend

We built a robust microservices architecture using AWS Lambda with the following components:

Upload Service: Receives documents and stores them in S3, initiating a Step Functions workflow.
Extract Service: Uses AWS Textract to extract text from PDF documents and images.
Generate Service: Employs AWS Bedrock (Claude) to generate exams based on extracted content.
PDF Service: Converts the generated exam to downloadable PDF format.
Status Service: Provides real-time progress updates.

All infrastructure was defined as code using Terraform, following IaC best practices.

Next.js Frontend

The frontend uses Next.js 14 with App Router and follows a modern design with:

Tailwind CSS for styling
React Query for state management and synchronization
Radix UI for accessible components
React Hook Form for form handling

We implemented an attractive, responsive design that guides the user through the exam generation process and displays real-time progress updates.

The Vibe Coding Process with Claude

Now, let's look at the actual process of building ExamGenius. Unlike traditional development, where I would have started by writing detailed documentation, setting up repositories, and manually creating each component, with Claude we adopted a more fluid and exploratory approach.

1. Initial Definition

I started with a simple definition:

I want to create a GenAI-based web application called "ExamGenius" that allows 
students to upload one or more PDF documents or photographs of textbooks or notebooks 
and based on this information creates a practice exam that can be downloaded as a PDF.

Claude immediately proposed a serverless microservices architecture on AWS, with key components like AWS Bedrock, Textract, S3, Lambda, and Step Functions. From there, we began to iterate.

2. Architecture and Flow Definition

Within minutes, Claude produced detailed diagrams using Mermaid to visualize both the microservices architecture and the application workflow:

3. Backend Development

With the architecture agreed upon, we proceeded to build each Lambda service. Claude generated complete and well-structured code for each component, including:

Proper error handling
Step Functions configuration
Terraform code for infrastructure
Packaging scripts

Interestingly, Claude could maintain an extraordinarily coherent context across multiple services, ensuring everything integrated correctly.

4. Frontend Development

In parallel, we developed a modern frontend using Next.js. Claude surprised me with its ability to follow current best practices:

Component-based React App Router
Custom hooks for business logic
Asynchronous loading pattern with loading/success/error states
Real-time visual feedback of progress

Here's an example of an upload component that Claude generated:

'use client';

import React, { useState, useCallback } from 'react';
import { useDropzone } from 'react-dropzone';
import { FileUp, X, AlertCircle, FileText } from 'lucide-react';
import { Button } from '@/components/ui/button';
import { Card, CardContent, CardFooter, CardHeader, CardTitle } from '@/components/ui/card';
import { validateFile } from '@/utils/validation';
import { formatFileSize } from '@/utils/format';
import { useUpload } from '@/hooks/use-upload';

export function UploadForm({ onSuccess, className }) {
  const [files, setFiles] = useState([]);
  const { uploadFiles, isUploading, error, result } = useUpload();

  const onDrop = useCallback((acceptedFiles) => {
    const validFiles = acceptedFiles.filter(file => validateFile(file).valid);
    setFiles(prevFiles => [...prevFiles, ...validFiles]);
  }, []);

  // Remaining implementation...
}

5. Iteration and Refinement

One of the most valuable aspects of the Vibe Coding approach was the ability to iterate quickly. When we realized that the jobs/status endpoint was missing to query the status of jobs, for example, I simply mentioned the issue and Claude generated a complete implementation, including proper IAM permissions handling.

Challenges and Learnings

This approach wasn't without challenges:

1. CORS Issues

We faced typical CORS problems when developing locally. Claude not only correctly identified the issue but proposed multiple solutions:

Configure CORS in API Gateway
Set up CORS in Lambda responses
Implement a temporary proxy in Next.js

2. IAM Permissions

A frequent challenge in AWS applications is the correct configuration of permissions. When the status Lambda couldn't list Step Functions executions, Claude identified that we needed a more specific IAM policy:

{
  Effect = "Allow"
  Action = [
    "states:ListExecutions"
  ]
  Resource = [
    aws_sfn_state_machine.exam_generation_workflow.arn
  ]
}

3. Styles Loading

When we had issues with styles loading in the frontend, Claude provided comprehensive solutions, including checks of the global.css file, Tailwind configuration, and import alternatives.

Comparison with Traditional Development

To put the value of this approach in perspective, let's compare the time and resources used:

Traditional Development:

Creation of detailed documentation: 1-2 days
Repository and environment setup: 0.5 days
AWS backend development: 5-7 days
Frontend development: 4-5 days
Testing and integration: 2-3 days Total: 12.5-17.5 days (100-140 hours)

Vibe Coding with Claude:

Initial definition and architecture: 2 hours
Backend development: 8 hours
Frontend development: 6 hours
Iteration and corrections: 4 hours Total: 20 hours

This comparison is revealing: the time required was reduced by approximately 85%, without compromising code quality or best practices.

Conclusions: The Future of Software Development

This experience has led me to several important conclusions:

1. The Developer Role is Evolving

Instead of writing every line of code, the future developer might focus more on:

Clearly defining problems
Making strategic architectural decisions
Validating and refining generated code
Providing the business context that AI lacks

2. Democratization of Development

AI tools like Claude are democratizing software development, allowing individuals with limited technical knowledge to build complex solutions.

3. Code Quality

Contrary to what might be expected, the generated code followed good practices, was well-structured and documented, and used modern patterns.

4. Speed with Quality

Perhaps most impressively: we didn't have to choose between speed and quality. We got both.

Is This the Future?

ExamGenius is just one example of what's possible with AI-assisted programming. While this approach has limitations (especially in very large or highly specialized projects), it clearly shows a future where developers and AI work in tandem, combining human creativity and context with AI speed and precision.

The next time you have an idea for a web application, consider the Vibe Coding approach. It's not about replacing developers but empowering them, allowing them to focus on what they do best: solving real problems for real users.

Have you experimented with similar approaches in your development? What have been your experiences with AI programming tools? Share your thoughts in the comments.

De Idea a Aplicación Web: Creando ExamGenius con Vibe Coding y Claude

Luis A. Obando — Tue, 01 Apr 2025 02:33:29 +0000

En el dinámico mundo del desarrollo de software, encontrar formas más eficientes de materializar ideas ha sido siempre una búsqueda constante. Recientemente, completé un proyecto que transformó radicalmente mi perspectiva sobre el desarrollo de aplicaciones web: ExamGenius, una aplicación que permite generar exámenes de práctica a partir de documentos subidos por estudiantes, utilizando IA generativa.

Lo que hace a este proyecto especial no es solo su funcionalidad, sino cómo se construyó: utilizando un enfoque de "Vibe Coding" con Claude de Anthropic como mi copiloto de desarrollo. En este artículo, compartiré esta experiencia de desarrollo, los desafíos que enfrentamos, y cómo la IA generativa está cambiando el panorama del desarrollo de software.

¿Qué es "Vibe Coding"?

Antes de sumergirme en los detalles técnicos, permítanme explicar lo que entiendo por "Vibe Coding". Es un enfoque de programación colaborativa donde un humano y una IA trabajan juntos, el humano proporcionando la dirección estratégica, requisitos y ajustes, mientras la IA maneja gran parte de la generación de código y la implementación de patrones de diseño.

Es diferente de simplemente usar un asistente de codificación. Con Vibe Coding, estás construyendo un sistema completo en colaboración con la IA, discutiendo arquitectura, tomando decisiones de diseño y revisando código juntos, casi como pair programming con un compañero digital extremadamente competente.

ExamGenius: El Proyecto

ExamGenius es una aplicación web que permite a los estudiantes cargar uno o varios documentos PDF o fotografías de libros de texto o apuntes. La aplicación utiliza IA para extraer el texto y generar un examen de práctica personalizado que puede descargarse en formato PDF.

Los requisitos clave incluían:

Una arquitectura de microservicios serverless en AWS
Cumplimiento con los principios de 12-factor app
Frontend moderno desarrollado con Next.js
Backend basado en Lambda, S3, Step Functions y Bedrock

Arquitectura Técnica

La solución que construimos consta de dos componentes principales:

Backend Serverless

Construimos una sólida arquitectura de microservicios utilizando AWS Lambda con los siguientes componentes:

Upload Service: Recibe documentos y los almacena en S3, iniciando un flujo en Step Functions.
Extract Service: Utiliza AWS Textract para extraer texto de documentos PDF e imágenes.
Generate Service: Emplea AWS Bedrock (Claude) para generar exámenes basados en el contenido extraído.
PDF Service: Convierte el examen generado a formato PDF descargable.
Status Service: Proporciona actualización en tiempo real del progreso.

Toda la infraestructura se definió como código utilizando Terraform, siguiendo las mejores prácticas de IaC.

Frontend con Next.js

El frontend utiliza Next.js 14 con App Router y sigue un diseño moderno con:

Tailwind CSS para el estilizado
React Query para manejo de estado y sincronización
Radix UI para componentes accesibles
React Hook Form para manejo de formularios

Implementamos un diseño atractivo y responsive que guía al usuario a través del proceso de generación de exámenes y muestra actualizaciones de progreso en tiempo real.

El Proceso de Vibe Coding con Claude

Ahora, veamos cómo fue el proceso real de construir ExamGenius. A diferencia del desarrollo tradicional, donde habría comenzado escribiendo documentación detallada, configurando repositorios y creando manualmente cada componente, con Claude adoptamos un enfoque más fluido y exploratorio.

1. Definición Inicial

Comencé con una definición simple:

Quiero crear una aplicación web basada en GenAI, la aplicación se llama "ExamGenius", 
esta app permite a los estudiantes cargar uno o varios documentos PDF o fotografías 
de libros de texto o de su cuaderno y basado en esta información crea un examen de 
práctica que pueda ser descargado en formato PDF.

Claude inmediatamente propuso una arquitectura de microservicios serverless en AWS, con componentes clave como AWS Bedrock, Textract, S3, Lambda y Step Functions. A partir de ahí, comenzamos a iterar.

2. Definición de Arquitectura y Flujo

En pocos minutos, Claude produjo diagramas detallados utilizando Mermaid para visualizar tanto la arquitectura de microservicios como el flujo de trabajo de la aplicación:

3. Desarrollo del Backend

Con la arquitectura acordada, procedimos a construir cada servicio Lambda. Claude generó código completo y bien estructurado para cada componente, incluyendo:

Manejo adecuado de errores
Configuración de Step Functions
Código Terraform para la infraestructura
Scripts de empaquetado

Lo interesante fue que Claude podía mantener un contexto extraordinariamente coherente a lo largo de múltiples servicios, asegurando que todo se integrara correctamente.

4. Desarrollo del Frontend

En paralelo, desarrollamos un frontend moderno utilizando Next.js. Claude me sorprendió con su capacidad para seguir las mejores prácticas actuales:

App Router basado en componentes React
Hooks personalizados para lógica de negocio
Patrón de carga asíncrona con estados de carga/éxito/error
Feedback visual en tiempo real del progreso

Aquí hay un ejemplo de un componente de carga que Claude generó:

'use client';

import React, { useState, useCallback } from 'react';
import { useDropzone } from 'react-dropzone';
import { FileUp, X, AlertCircle, FileText } from 'lucide-react';
import { Button } from '@/components/ui/button';
import { Card, CardContent, CardFooter, CardHeader, CardTitle } from '@/components/ui/card';
import { validateFile } from '@/utils/validation';
import { formatFileSize } from '@/utils/format';
import { useUpload } from '@/hooks/use-upload';

export function UploadForm({ onSuccess, className }) {
  const [files, setFiles] = useState([]);
  const { uploadFiles, isUploading, error, result } = useUpload();

  const onDrop = useCallback((acceptedFiles) => {
    const validFiles = acceptedFiles.filter(file => validateFile(file).valid);
    setFiles(prevFiles => [...prevFiles, ...validFiles]);
  }, []);

  // Implementación restante...
}

Y acá vemos unas imágenes del diseño final generado

5. Iteración y Refinamiento

Uno de los aspectos más valiosos del enfoque de Vibe Coding fue la capacidad de iterar rápidamente. Cuando nos dimos cuenta de que faltaba el endpoint jobs/status para consultar el estado de los trabajos, por ejemplo, simplemente mencioné el problema y Claude generó una implementación completa, incluyendo manejo de permisos IAM adecuados.

Desafíos y Aprendizajes

Este enfoque no estuvo exento de desafíos:

1. Problemas de CORS

Enfrentamos problemas de CORS típicos al desarrollar localmente. Claude no solo identificó correctamente el problema, sino que propuso múltiples soluciones:

Configurar CORS en API Gateway
Configurar CORS en las respuestas Lambda
Implementar un proxy temporal en Next.js

2. Permisos IAM

Un desafío frecuente en aplicaciones AWS es la configuración correcta de permisos. Cuando la Lambda de status no podía listar ejecuciones de Step Functions, Claude identificó que necesitábamos una política IAM más específica:

{
  Effect = "Allow"
  Action = [
    "states:ListExecutions"
  ]
  Resource = [
    aws_sfn_state_machine.exam_generation_workflow.arn
  ]
}

3. Carga de Estilos

Cuando tuvimos problemas con la carga de estilos en el frontend, Claude proporcionó soluciones exhaustivas, incluyendo verificaciones del archivo global.css, configuración de Tailwind, y alternativas de importación.

Comparación con el Desarrollo Tradicional

Para poner en perspectiva el valor de este enfoque, comparemos el tiempo y recursos utilizados:

Desarrollo Tradicional:

Creación de documentación detallada: 1-2 días
Configuración de repositorio y entorno: 0.5 días
Desarrollo del backend con AWS: 5-7 días
Desarrollo del frontend: 4-5 días
Testing e integración: 2-3 días Total: 12.5-17.5 días (100-140 horas)

Vibe Coding con Claude:

Definición inicial y arquitectura: 2 horas
Desarrollo del backend: 8 horas
Desarrollo del frontend: 6 horas
Iteración y correcciones: 4 horas Total: 20 horas

Esta comparación es reveladora: el tiempo necesario se redujo aproximadamente un 85%, sin comprometer la calidad del código o las mejores prácticas.

Conclusiones: El Futuro del Desarrollo de Software

Esta experiencia me ha llevado a varias conclusiones importantes:

1. El Rol del Desarrollador está Evolucionando

En lugar de escribir cada línea de código, el futuro desarrollador podría enfocarse más en:

Definir problemas con claridad
Tomar decisiones arquitectónicas estratégicas
Validar y refinar el código generado
Aportar el contexto de negocio que la IA no posee

2. Democratización del Desarrollo

Las herramientas de IA como Claude están democratizando el desarrollo de software, permitiendo que individuos con conocimientos técnicos limitados construyan soluciones complejas.

3. Calidad del Código

Contrario a lo que podría esperarse, el código generado seguía buenas prácticas, estaba bien estructurado y documentado, y utilizaba patrones modernos.

4. Velocidad con Calidad

Quizás lo más impresionante: no tuvimos que elegir entre velocidad y calidad. Obtuvimos ambas.

¿Es Este el Futuro?

ExamGenius es solo un ejemplo de lo que es posible con la programación asistida por IA. Si bien este enfoque tiene limitaciones (especialmente en proyectos muy grandes o altamente especializados), muestra claramente un futuro donde los desarrolladores y la IA trabajan en tándem, combinando la creatividad y contexto humano con la velocidad y precisión de la IA.

La próxima vez que tengas una idea para una aplicación web, considera el enfoque de Vibe Coding. No se trata de reemplazar a los desarrolladores, sino de potenciarlos, permitiéndoles enfocarse en lo que hacen mejor: resolver problemas reales para usuarios reales.

¿Has experimentado con enfoques similares en tu desarrollo? ¿Cuáles han sido tus experiencias con herramientas de IA para programación? Comparte tus pensamientos en los comentarios.