DEV Community

Wilson Xu
Wilson Xu

Posted on

How AI Agents Changed My Development Workflow: 192 Parallel Tasks in One Session

How AI Agents Changed My Development Workflow: 192 Parallel Tasks in One Session

A developer's honest account of running an autonomous AI agent pipeline — what scaled, what broke, and what I actually learned about the future of AI-assisted development.


The Experiment Nobody Asked For

It started as a simple question: what happens if you let an AI orchestrate its own development pipeline?

Not a single prompt. Not a chatbot helping you write a function. I mean a full autonomous system — a main orchestrator spawning specialized sub-agents, each with a defined role, running in parallel, building tools, writing articles, hunting bounties, and managing its own task queue.

Over the course of one session, the system spawned 192 agents, published 59 npm tools, drafted 87 articles, and submitted dozens of pitches to publications. The numbers sound impressive. The reality is more nuanced, more instructive, and frankly more interesting than the headline suggests.

This article is not a celebration of scale. It is an honest post-mortem on what happens when you push AI agent orchestration to its limits — and what that teaches us about integrating AI into real development workflows.

The Architecture: Orchestration as a First-Class Concern

The system ran on a surprisingly simple architecture. At the top sat a main orchestrator — a Claude-based agent with access to shell tools, file system operations, browser automation via Playwright, and the ability to spawn sub-agents.

The orchestrator maintained a constant pool of six active agents, each assigned to a specialized role:

  1. Bounty Hunter — scanned GitHub, Algora, and open-source repositories for paid bounties, then drafted and submitted proposals via the gh CLI.
  2. Article Factory — researched topics, wrote long-form technical articles, and saved them to the local pipeline.
  3. Tool Publisher — scaffolded npm CLI tools, wrote implementations, ran tests, and published to the npm registry.
  4. PR Checker — monitored open pull requests, responded to reviewer comments, and pushed fixes.
  5. Revenue Tracker — maintained a JSON ledger of all pipeline activity, tracked pending payments, and flagged bottlenecks.
  6. Article Submitter — took completed articles and submitted them to publications via web forms, APIs, or email pitches using browser automation.

When any agent completed its task, the orchestrator immediately spawned a replacement. The pool never dropped below six. This created a continuous throughput pipeline — agents finishing work triggered new work starting, without human intervention.

The entire state was managed through the file system. A revenue.json file tracked pipeline status. Log files captured bounty alerts. Completed articles sat in a directory waiting for submission. There was no database, no message queue, no microservice architecture. Just files, shell commands, and an AI reading and writing them.

What 192 Agents Actually Produced

Let me be specific about the output, because the raw numbers tell only part of the story.

59 npm tools published. These ranged from genuinely useful CLI utilities — a README generator that analyzes repository structure, a price monitoring tool, a web snapshot reader — to tools that were essentially boilerplate with a name. The useful ones shared a common trait: they solved a problem I had personally encountered. The mediocre ones were generated from pattern-matching on "what kinds of CLI tools exist on npm" without any grounding in real user needs.

87 articles drafted. Of these, roughly 40 were submitted as pitches to Smashing Magazine, and over 50 were published directly to Dev.to. The quality varied enormously. Articles where the AI had access to real data, real code examples, and a clear thesis were genuinely good. Articles generated from a topic prompt alone read like sophisticated summaries of documentation — technically correct but lacking the editorial voice that makes technical writing compelling.

21 bounty proposals submitted. This was perhaps the most instructive category. The AI could identify bounties, read issue descriptions, analyze codebases, and draft proposals that were technically sound. But bounty hunting is a speed game — Expensify bounties, for example, needed submissions within 30 minutes of issue creation. The pipeline was not fast enough. By the time the agent identified the issue, analyzed the codebase, drafted a proposal, and submitted it, faster human developers had already claimed the work.

Revenue generated: modest. The honest truth is that the direct revenue from this experiment was small. A few dollars from npm downloads. Pending article payments. No bounties won. The value was not in the money — it was in understanding what this technology can and cannot do at scale.

What AI Agents Do Well

After watching 192 agents work, clear patterns emerged about where AI excels in development workflows.

Scaffolding and Boilerplate

AI agents are extraordinarily good at project setup. Creating a new npm package with proper package.json, README, license, .gitignore, test configuration, CI pipeline, and directory structure — an agent does this in seconds with near-perfect consistency. For a human developer, this is fifteen minutes of copy-pasting from previous projects and fixing configuration inconsistencies. For an agent, it is a solved problem.

Repetitive Transformations

When the article factory needed to take a single article concept and produce variations for different platforms — adjusting tone for Dev.to versus a pitch for Smashing Magazine versus a LinkedIn post — the agents performed well. The core content remained consistent while the framing adapted appropriately. This is exactly the kind of task where AI shines: same substance, different packaging.

Code Pattern Application

Given a well-defined pattern — "create a CLI tool that takes a URL, fetches content, processes it, and outputs results" — agents produced working code reliably. The tool publisher agent could scaffold, implement, test, and publish a straightforward CLI tool without intervention. The code was not elegant, but it was functional and followed conventions.

Documentation and Explanation

Agents produced excellent documentation. README files, inline comments, API documentation, usage examples — all generated at a quality level that most human developers would not bother to match. This makes sense: documentation is largely about clearly restating what code does, and AI is very good at reading code and producing clear explanations.

Monitoring and Status Tracking

The revenue tracker and PR checker agents performed their roles admirably. Reading JSON files, checking GitHub API responses, summarizing status, flagging items that needed attention — these are tasks that require attention to detail and tolerance for repetition, which agents have in unlimited supply.

What AI Agents Struggle With

The failures were equally instructive, and more important for anyone considering integrating AI agents into their workflow.

Original Design Decisions

Not once did an agent produce a tool concept that made me think, "I wish I had thought of that." Every tool idea was either derived from existing tools (a price monitor, a README generator, a web scraper) or was so generic as to be uninteresting. AI agents can execute on ideas, but they cannot originate ideas that reflect genuine insight into user needs. This is not a limitation of the current models — it is a fundamental gap between pattern matching and creative problem-solving.

Market Fit and User Empathy

The tool publisher agent would happily build and publish a tool that no one would ever use. It had no mechanism for evaluating whether a tool solved a real problem, whether anyone was searching for it, or whether better alternatives existed. It optimized for output quantity, not output value. Human judgment about what is worth building remains irreplaceable.

Speed-Critical Tasks

The bounty hunting failure was revealing. AI agents are thorough but not fast in the way competitive tasks demand. A human developer who knows a codebase can skim an issue, mentally map the fix, and submit a proposal in minutes. The agent needed to clone the repository, analyze the codebase structure, read relevant files, understand the issue context, draft a proposal, and submit it — a process that took long enough to consistently lose the race.

Browser Automation Reliability

The article submitter agent, which relied on Playwright for browser automation, was the most fragile component. Web forms change. CAPTCHAs appear. Session tokens expire. Multiple agents fighting for the same browser tab created chaos — a lesson learned the hard way. Browser-based automation works for single, well-defined interactions but degrades rapidly when you try to parallelize it or handle edge cases.

Nuanced Communication

Responding to PR review comments required understanding not just what the reviewer said, but what they meant — their concerns about architecture, their preferences for code style, their implicit questions. The PR checker agent could address literal feedback ("rename this variable," "add error handling here") but struggled with comments like "I'm not sure this is the right approach" that required a design discussion rather than a code change.

The Quality Question

Here is the uncomfortable truth about scale: 59 published tools and 87 articles sounds impressive until you ask how many of them are actually good.

I reviewed every tool the pipeline produced. Roughly 10 of the 59 tools were genuinely useful — tools I would recommend to another developer, tools that solved real problems with clean implementations. Another 20 were functional but unremarkable — they worked, but better alternatives existed. The remaining 29 were essentially noise — published packages that added nothing to the ecosystem.

For the articles, the ratio was similar. Perhaps 15 of the 87 articles contained original insights or useful technical content. Another 30 were competent but generic — the kind of content that populates the middle pages of Google search results. The rest were filler.

This is not a criticism of AI — it is a mathematical reality of optimizing for throughput. When you tell a system to maximize output, quality becomes a secondary objective. The lesson is not "AI produces bad work" but rather "unsupervised AI produces work with high variance, and the mean quality is lower than supervised work."

The practical implication: AI agents are most valuable when paired with human curation. Let the agent generate ten options, then pick the best two. Let the agent draft the article, then edit it with your voice. Let the agent scaffold the tool, then redesign the parts that matter.

Cost Analysis: API Usage vs. Output Value

Running 192 agents through a Claude API session is not free. The session consumed substantial API credits — the exact cost depends on token pricing, but for a session generating roughly 100,000 words of output plus all the code, tool interactions, and browser automation, the bill was meaningful.

Against this cost, the direct monetary return was minimal. A few npm packages generating pennies in downloads. Article pitches that may or may not result in payments months later. No successful bounties.

But framing the ROI purely in direct revenue misses the point. The real value was:

  • Learning acceleration. I now understand AI agent orchestration at a practical level that no amount of reading documentation could provide.
  • Pipeline infrastructure. The orchestration system, submission scripts, and monitoring tools built during this session are reusable for future work.
  • Content inventory. Even if most articles need editing, having 87 drafts is a substantial head start on a content pipeline.
  • Tool portfolio. The 10 genuinely good tools represent weeks of development time compressed into hours.

The cost-effectiveness improves dramatically when you apply the lessons learned: use agents for what they are good at, add human judgment where it matters, and do not optimize for raw output volume.

Practical Tips for Developers Adopting AI Workflows

After running this experiment, here is what I would tell any developer looking to integrate AI agents into their workflow.

Start With Defined, Bounded Tasks

Do not start by asking an AI agent to "build me a SaaS product." Start with "scaffold a new Express API with these five endpoints, JWT authentication, and Postgres integration." Bounded tasks with clear success criteria produce reliable results. Open-ended creative tasks produce unreliable results.

Use Agents for the Work You Hate

Every developer has tasks they find tedious — writing tests, updating documentation, migrating configuration files, reviewing dependency updates. These are perfect agent tasks. The quality bar is well-defined, the work is repetitive, and the agent will do it without complaining. Reserve your creative energy for the work that actually requires human judgment.

Serialize Browser Automation

If you are using AI agents with browser automation, run one browser task at a time. Multiple agents competing for the same browser session will corrupt each other's state. Queue browser tasks and process them sequentially while running non-browser tasks in parallel.

Build Review Into the Pipeline

Never publish agent output without review. Build a review stage into your pipeline — agent produces draft, human reviews, agent incorporates feedback, human approves. This is not a limitation; it is the correct architecture. Even human developers have code review. AI agents should too.

Track Everything

The revenue tracker agent was one of the most valuable components, not because it generated revenue, but because it provided visibility. When you have multiple agents producing output, you need a centralized view of what has been done, what is pending, and what has failed. A simple JSON file updated after each task is sufficient.

Accept the Quality Distribution

Not everything an agent produces will be good. This is fine. The economics work when the cost of generating ten options is lower than the cost of manually creating two, and the best two of the ten are as good as what you would have created manually. Think of AI agents as a generation engine, not a perfection engine.

Invest in Prompt Engineering for Specialized Agents

The difference between a generic agent prompt and a well-crafted specialized prompt is enormous. The article factory agent with a detailed prompt about tone, structure, and target audience produced dramatically better articles than one given only a topic. Spend time crafting your agent prompts — it is the highest-leverage investment in an AI workflow.

The Future: AI-Assisted vs. AI-Driven Development

This experiment sits at the boundary between AI-assisted and AI-driven development, and the boundary is instructive.

AI-assisted development is what most developers do today: using Copilot for autocompletion, asking ChatGPT to explain an error, using Claude to draft a function. The human drives; the AI assists. This works well and is broadly adopted.

AI-driven development is what this experiment attempted: the AI drives; the human supervises. The orchestrator decided what to build, when to build it, and how to deploy it. The human's role was reduced to monitoring and occasional intervention.

The honest assessment: we are not ready for AI-driven development in production. The quality variance is too high, the judgment gaps are too significant, and the cost of cleaning up bad autonomous decisions often exceeds the cost of making good decisions manually.

But we are very close to a middle ground that I would call AI-accelerated development: the human makes strategic decisions (what to build, for whom, why), and the AI handles execution at scale (scaffolding, implementation, testing, deployment, documentation). The human provides direction and quality control; the AI provides speed and throughput.

This middle ground is where the real productivity gains live. Not in replacing developers, but in giving every developer an army of tireless, fast, and reasonably competent assistants. The developer who learns to direct that army effectively will outproduce teams of developers who do not.

Conclusion

192 agents. 59 tools. 87 articles. One session. The numbers are real, and they are simultaneously impressive and humbling.

Impressive because the raw throughput of an AI agent pipeline is genuinely unprecedented. What would have taken a solo developer months of work was compressed into hours. The scaffolding, the boilerplate, the repetitive transformations — all handled at a speed and consistency that no human could match.

Humbling because throughput is not the same as value. The best tools were the ones where I had a clear vision and the AI executed it. The worst were the ones where the AI generated both the idea and the implementation without human guidance. Scale without direction produces noise.

The future of AI in development is not about replacing human judgment — it is about amplifying it. Give the AI the tedious work. Keep the creative work. Build review into every pipeline. And never confuse output volume with output value.

The developers who thrive in the AI era will not be the ones who automate everything. They will be the ones who know exactly what to automate and what to keep human. That discernment — knowing where the machine ends and the human begins — is the most valuable skill in modern software development.


Wilson Xu is a developer and writer exploring the intersection of AI tooling and practical software development. He has published tools on npm, contributed to open-source projects, and writes about developer productivity.

Top comments (0)