Coding agents for production iOS: a senior engineer's setup for 2x the output
I'm a senior iOS engineer at Anytype, fully responsible for the open-source iOS app. I've been using a coding agent daily for about 8 months now, and I recently pulled my Task tracker data to see what happened to my productivity. I was expecting maybe a 30-50% bump. It was over 100%. Without compromises for the quality, or overtime work. I stared at the chart for a while.

Story points per month, January 2025 โ January 2026. Interactive version
It gets weirder. Our iOS team shrank from three engineers to two around June, then to just me by October. I was suddenly working with code I'd never seen - chats, sharing extensions, stuff I'd never touched. By every reasonable prediction, my output should have dropped. Instead it doubled.
I'm not doing anything crazy. My setup is close to vanilla. I just spent a lot of time figuring out how to work with this thing instead of fighting it.
Most AI coding content covers ๐ greenfield projects ๐ or week-long experiments. This isn't that. This is a production app with real users, years of legacy code, and all the responsibilities that come with it: code review, release management, on-call bugs, coordinating with a team. The kind of work most engineers do every day.
A note on tools: My current setup runs on Claude Code. I love both performance and ethos of the product. But the principles here - work with any terminal-based coding agent. The specific tool matters less than the workflow around it. If something better shows up tomorrow, I'll switch. The investment is in the process and scaffolding around agents.
What my day actually looks like
My workflow is nothing like what it used to be. I don't write code manually anymore. It's closer to a tech lead reviewing pull requests and giving directions than an engineer heads-down in Xcode.
What I do now breaks down into three things: aligning humans and agents on what we're trying to build, validating everything agents produce, and educating them on how good code works.
I keep 2-3 agent instances running in separate terminal tabs. Each one works on a different task. So I'll kick off a feature in one tab, switch to reviewing code in another, maybe have a third one doing a small bug fix or refactoring. Three is the sweet spot for me, otherwise I overload and fatigue quickly.
Xcode is barely in the picture anymore. I use xcodebuild from the terminal for builds and only open Xcode when I need to manually intervene, which is once a week recently.

Three agent instances working on separate tasks simultaneously
This is not vibecoding. I read 100% of the code. I go through changes in Git GUI, same as I would review a pull request from a teammate. The level of negligence you can afford in a ๐ greenfield ๐ vibecoded pet project and in real company code are completely different things. It's our responsibility as engineers to own what we ship. You wrote it, or you approved it, either way it's yours. Ownership is paramount to stay in control and avoid piling up cognitive debt. It's slower than yolo-merging AI output, but it's the difference between engineering and gambling.

Every line of agent output gets reviewed like a teammate's pull request
The code that ships is the same quality it was before agents. In some areas better, because now I have way more mental energy and less resistance to fix a small bug or refactor something pesky.
The pipeline looks like this:

Task ID โ Plan โ Code โ Review โ PR
- I give the agent a task ID from our task tracker. The MCP server grabs the full description automatically.
- Run agent in plan mode. For simple stuff I don't even read the plan - I just let it run and review the output code. For anything bigger, I brief it the way I'd brief a new team member - here's the task, here's how I'd approach it, here are the files to look at, watch out for this edge case. Catching a bad direction early saves a ton of time.
- I review the code. If I don't like something, I tell it what to change. "I don't like this part, refactor this, this approach is wrong because..."
- We iterate until I'm satisfied. Simple tasks take one round. Anything involving architecture takes several. AI still doesn't really understand engineering excellence - what makes code well-organized, what makes mental models clean. You have to push it there.
One thing I always do: if the agent gets something wrong in a way that it shouldn't, like using the wrong pattern or ignoring a convention, I don't just fix it. I update the skill or the documentation so it doesn't happen again. Most of my skills started this way.
You fail, you document, and next time the same category of task goes right on the first try. It's basically reinforcement learning, except you're the one writing the reward function.
- I type a slash command and it commits, pushes, and creates a pull request. I haven't written a commit message or PR description by hand in months. Most of us write terrible commit messages and bare-minimum PR descriptions anyway. AI does a better job here because it reads the full diff and summarizes what changed.
The setup that made 2x happen
Everyone asks about the tools. The tools are fine. I've spent maybe 30% of my time, sometimes 50%, just improving my tooling, processes and documentation for the past six months. That's a big investment. But it's what separates "this is a gimmick" from "my agent doubled my output."
Git worktrees
You need this if you want to run multiple instances. Each agent instance gets its own worktree - a separate copy of the repo that shares git history but has its own working directory. Without worktrees, the instances find each other's changes and go nuts. Keep them isolated.

Worktrees: shared git history, isolated working directories ยท GitKraken
MCP servers what is mcp?
I use two, and honestly for me they are enough:
Figma MCP - I paste a Figma link and the agent grabs the screenshot directly. This is significantly better than pasting raw screenshots or mockups into the prompt. The MCP gives the agent structured design context, not just pixels. It knows what the components are, what the layout hierarchy looks like.
Task tracker MCP - I give it a task ID, it pulls the title, description, all linked context. This is huge. The more context you feed it upfront, the higher the chance it gets the task right in one shot. I asked my manager to start using AI to write really detailed task descriptions and the one-shot success rate went way up.

MCP servers bridge external tools into the agent's context ยท DataHub
Agent config file - keep it lean
Most coding agents have a project-level config file (AGENTS.md, CLAUDE.md). Guidelines suggest keeping it ideally under 50-100 lines, with a maximum recommended limit of 300 lines. It has the main conventions, architecture pointers, git workflow basics. Everything else lives in skills.
Skills (this is probably the most useful part)
The idea: you create focused documents about specific parts of your codebase that the agent loads only when relevant.
How I create them: I literally tell the agent "spend some time, research our codebase, look how we work with analytics, and create a skill." It reads the code, finds the patterns, documents them. I review the skill and edit it. Next time I work with analytics, that skill is loaded and the agent already knows our conventions.
For example, my analytics skill is a markdown file that documents which events we track, the naming conventions, where the tracking code lives, and the helper functions to use. When the agent loads it, it stops inventing event names and starts using ours. The file is maybe 60 lines.
I have skills for: analytics patterns, code generation rules, liquid glass UI, localization, even how to maintain the config file itself.
For what it's worth, I worked with just the config file for about three months before I started creating skills. It was a natural evolution - the config got too bloated, results got worse, so I broke it apart.
I even built a hook that auto-activates skills based on keywords. If I mention "glass" in my prompt, the liquid glass skill gets loaded automatically.
Pro tip: you can point the agent to another repository and tell it to look at how those guys set up their skills. Maybe there's something you can learn from. I've picked up a few ideas that way. Here's our skills folder if you want to take a look.
Auto-generated documentation
I have the agent generate code documentation for critical parts of the codebase. Not for me to read - for the agent to read. It helps a lot with context because the model looks at code documentation and understands which thing does what, how modules connect, where the boundaries are. I have an iOS development guide, an object creation guide, design system docs. All auto-generated, all reviewed by me, all sitting next to the code they describe.
Voice input (the biggest quality-of-life upgrade)
Voice input changed how good my prompts are, and prompt quality is the single biggest factor in whether the agent nails a task or wastes fifteen minutes going in the wrong direction. If the hottest new programming language is English, you'd better get fluent.
When I type prompts, I write maybe three words and go "okay, enough." Typing is effort, and my brain optimizes for brevity. When I talk, I ramble, and the rambling is good. I mention the file that has a similar implementation. I explain why the last approach didn't work. I say "oh and watch out for that thing in the middleware." All the context I would give a colleague sitting next to me but would never bother to type.
I use VoiceInk. I tried a few options but this one stuck because there's zero cloud dependency and the latency of local models is low enough that it doesn't break my flow. I hit a hotkey, can talk for minutes and the text appears in the terminal. That's it.

VoiceInk - wrapper over local voice-to-text models I use
Notifications
Without notifications, an instance just sits idle for 20 minutes while you're reviewing code in another tab. I set up macOS system notifications via a Claude Code hook that fires when the agent asks for permission or has been waiting for input for more than a minute. When you're running three instances and you step away to check your task tracker or grab water, you need something to pull you back.

macOS notifications from Claude Code asking for permissions and waiting for input
Status line
When I come back from lunch with no idea what my three instances are doing, I need seconds to reorient, not minutes scrolling through terminal history. I have a custom terminal segment that shows the current working directory, token usage, weekly spend, and remaining context window under each tab. My tabs are named "Claude", "Twin", and "Triplet." Context window remaining tells me whether an instance is about to hit its limit and needs a fresh start. Current usage tells me when there will be the next mandatory timeout ๐. The directory path tells me which feature branch each one is in.

Terminal tabs: Claude, Twin, and Triplet ยท ccstatusline
Terminal shortcuts
I close all my tabs many times a day to keep my mind from overwhelm. Launching an agent is just typing c in terminal - this function in my .zshrc jumps to the worktree and starts Claude. Not rocket science, but I use them 20 times a day.
c() {
cd ~/code/anytype-swift-claude/
claude
}
# I also have cc and ccc for other worktrees
Visual verification with simctl
Visual regression testing that used to require dedicated teams building and maintaining tooling, the agent does on demand. It builds the app, boots a simulator, navigates to any screen via deep links, interacts with UI elements through idb (tap, swipe, type), takes a screenshot, and visually analyzes what's on screen. The toolchain is xcrun simctl for simulator lifecycle, deep links to jump directly to specific screens, idb for UI interaction, and Claude's multimodal vision to look at the result.
No pixel-diffing, no locator maintenance, no custom comparison pipelines. The agent just looks at the screenshot and understands what it sees: truncated text, broken spacing, overlapping elements, a button that ended up off-screen. Our skill for this
The learning curve
I started in June 2025. For the first few months, I could barely use it for anything beyond simple refactoring.
I tried giving it real tasks a couple of times and immediately understood: this is not production ready. The code was wrong, the architecture was bad, it didn't understand our patterns. It would "fix" one thing and silently break three others. It would ignore our design system and invent its own components, builds failed miserably.
So I went back to basics. Simple renames, mechanical refactoring - that worked. Anything beyond that was a mess. The question became: where's the boundary, and how do I push it?
So I did what I'd do with a junior: Give it a small task, explain why the approach was wrong, point it at the right files and existing implementations, and ask it to document what it learned.
That documentation became skills. That loop - try a task, watch it fail, explain why, write it down - is what built up the whole knowledge base. After maybe three months of this, things started to click. Not overnight.
Building an exhaustive knowledge base of how your app works - the patterns, the conventions, the architecture decisions, where things live and why - is the single thing that made this work. Once the agent has that, it stops guessing and starts producing code that fits your codebase. Six months in and I'm still updating skills every time the agent makes a new category of mistake.

The learning curve: three months of investment before results compound
What a lot of people get wrong
The process is what matters. How you communicate with the agent. What you give it before it starts. How you react when the output is wrong. That's where everyone I've watched struggle is struggling.
When I say my setup is "close to vanilla," I mean the plugins. The MCP servers. The model. None of that is exotic. If you're spending your time evaluating which agent has 3% better benchmark scores or juggling more and more plugins and wrappers, you're looking at the wrong layer.
The most common pattern I see: someone starts an agent, opens a task, types "implement the big feature," and then gets frustrated when the result is garbage. Of course it's garbage. You just gave a brand new team member zero context about your codebase, zero requirements beyond a three-word description, and the most complex task you could think of. No human would succeed with that briefing either.
The mental model that works: you're onboarding a junior developer. You wouldn't hand them the hardest ticket day one. You'd start small, point them at existing code that does something similar, warn them about the weird legacy thing in the networking layer, remind them about design and code conventions you have. "Make it" is not an input. "Make it look like this design using our design system components, following the pattern in ChatViewController, and make sure the loading state matches what we do in ProfileView" - that's better.
Where agents go beyond code
AI capabilities can and should be used way beyond coding tasks. There are a bunch I found most useful.
Release impact analysis
What used to take me an hour or two per release now takes about two minutes of me reviewing the output. I have slash commands that pull data from both the task tracker and git to generate release artifacts. I provide the branch of the current release, point it at the task tracker release with all linked tasks. It gathers context from git (what files changed, diffs) and from the tracker (task descriptions, project details), then spits out: a changelog, a team Slack message, a TestFlight description for Apple, App Store release notes, and posts for our community channels. It also generates impact analysis for the QA team, what changed, what areas to retest, what's risky. The whole release communication pipeline is basically automated. You can see it here.
PR review
I set up Claude as a PR reviewer on GitHub. It catches real bugs - duplicate entries in arrays, pattern violations, things that slip past human eyes at 5pm on a Friday. Not a replacement for human review, but a solid first pass that makes my own review sharper. For me as the sole iOS dev it is invaluable.
Whole pipeline has to change
Everything I described so far is me adapting my workflow around the agent. That's step one. Most teams are still in step one.
The real multiplier kicked in when our whole company started reshaping around how agents work. Not just me. Product and design too.
Our PM started using AI to write task descriptions, and the difference was immediate. The level of detail an agent needs is unreasonable to write by hand every time. Our average task description roughly tripled in length. Some tasks have 10x the context they used to - full acceptance criteria, test cases, edge cases. No one would write all that manually. But an AI-assisted PM does it in the same time they used to spend writing two sentences.
Same thing happened with design. Once our design system matched the code - named design tokens, a component library with clear hierarchy, icon bundles with consistent naming - the agent started nailing the UI on the first try.
This is becoming part of the engineering role whether we like it or not.
I've found myself becoming the person who understands what agents need to succeed, and pushing everyone upstream to deliver inputs that way. Teaching our PM to write agent-friendly tickets. Nudging our designer toward structured handoffs. That's engineering work now, and I didn't see it coming.
One-shotting tasks
A growing number of tasks now run end-to-end with zero additional context from me. Agent pulls the description from the task tracker, loads the relevant skills, and ships a working implementation. Analytics events, simple bug fixes, localization updates, small UI tweaks - nothing groundbreaking on its own, just the steady stream of small stuff that would take me 30+ minutes each of reading requirements, finding the right files, and writing the code. The agent does it while I'm focused on something worthy in another tab.
This is where it gets interesting: if a task runs end-to-end with zero context from me, why does it need me at all? The answer is - increasingly, it doesn't. The direction this is heading is non-engineers engaging with coding agents themselves. A PM fires off simple product tweaks. An analyst triggers an analytics update. The skills and docs are already there; the agent already knows the patterns. We're not fully there yet, but it's closer than "dream" - it's a trajectory that's already underway.
The pieces are already in place. The gap is trust and tooling, and both are closing fast. Cross-repo agents that pull context from your backend to implement the right API contract in your iOS app. Async agents that run while you sleep. The infrastructure exists already, it's just rough. Most teams haven't even started thinking about it.
The price of progress by Will Duncan
Security
The one thing I would really stress is security. I don't let them run with access to sensitive filesystems without being babysat. I put them in jailed users if I want them to be more autonomous. I use containerization for mass code generation or synthetic datasets. I ultimately pull the trigger on potentially destructive steps like commits and pushes. Prompt injection is very real, rm-ing your work is also real.
And audits - by humans. Models help on security but they're still relatively weak.
Coding while tired
It's the same as old development that you probably incur tech debt when you do it, but it's so much worse now because it becomes too tempting to yolo accept.
There are two exceptions where looser review can work: test-driven development coupled with extensive upfront design - if the tests are solid and the architecture is locked, you can multishot a problem and let the test suite be your reviewer. And generating against an objective function - synthetic datasets, data pipelines where the output is mechanically verifiable.
Everything else: if you're too tired to review carefully, you're too tired to use an agent.
Running multiple instances - honestly
It's possible to do multiple features concurrently, but agents can mess up your filesystem often enough to be careful, so it's a good idea to have consistent backups especially for data in your .gitignore. Agents will also sometimes go in a weird direction halfway through a plan, you can sometimes save time by babysitting them and stopping and redirecting before the overhead of reading through their bad direction builds up and they waste time generating unnecessary code. So I actually usually have one agent per repo going. I can imagine it depends on the workload, but I trust these things less than my sense of the zeitgeist says.
The bigger picture
Writing code isn't the bottleneck anymore - working memory and attention is. Running three parallel tasks, each at a different stage, switching between them without dropping context. By 5pm some days I'm fried in a way that writing code for 8 hours straight never made me. The exhaustion now comes from context switching, not intellectual strain.
The flip side: seniority is the multiplier. Every obscure crash you debugged, every threading issue you traced to a priority inversion no stack trace would surface - that knowledge is now your biggest leverage. The agent does the labor. You do the thinking (when it is needed). The gap between someone who can think clearly about architecture and someone who can't is getting wider, fast.
How I feel about it
I genuinely enjoy my work more now than at any point in my career.
That's not a small thing to say. The repetitive parts - boilerplate, plumbing, the stuff that drains you. I barely touch those anymore. Instead I spend my days designing systems, making architectural decisions, thinking about how pieces fit together. The work that made me want to be an engineer in the first place. Some days I finish work and I'm buzzing. Actual endorphins. I feel like I'm back to 10 years ago, when I first started my software journey.
Some days it genuinely feels like 10x. But the data says a little over 2x. There's a reason for the gap - perception is unreliable here. 2x on a production codebase is still a massive gain. Just don't confuse the vibes with the numbers.
It's addictive though. Be careful.


Top comments (0)