In February I handed my freelance business to an AI assistant.
Not "used AI to help" — actually handed it over. Email responses, client communication, code reviews, architectural decisions, invoicing. I gave it access to everything and set one rule: do not ask me for approval unless a decision would cost more than €500.
The reasoning was at least semi-serious. I'd been reading too many posts about AI productivity gains, and I wanted to measure what a developer with my experience actually offloads versus what requires real judgment. I expected to learn something useful. I did. Just not what I expected.
This is a weekly account of what happened.
Week 1: Suspiciously Good
The first thing I noticed: my email response time improved from "usually same day" to "typically within 4 minutes." Clients noticed. One wrote back: "You've been so responsive lately — did you hire someone?"
I hadn't hired someone. I'd hired something.
The AI answered support questions with correct, well-formatted replies. It triaged messages by urgency, flagged two client inquiries that needed my actual attention, and correctly identified a vendor invoice that had the wrong VAT number on it. I would have caught that eventually. It caught it in twelve seconds.
It also started writing my commit messages. This was unsolicited. The messages were noticeably better than mine — specific, consistent, past tense, scoped correctly. I didn't say anything. The commits kept arriving.
The commit messages were also... polite.
"Fix race condition in retry logic."
"Improve test reliability."
"Hope this helps future maintainers."
I have never hoped anything for future maintainers.
One message ended with: "Let me know if you'd like this refactored further."
I was not aware commit messages were now accepting feedback.
By the end of week one, I had the uncomfortable feeling that certain parts of my working day had simply... run themselves. I'd done roughly the same technical work I always do. Everything around it had just been handled.
Week 1 metrics:
- Emails answered: 34 (I typically answer ~30)
- Average response time: 6 minutes (my usual: several hours)
- Client escalations needed: 2
- Incorrect responses requiring correction: 1 (it recommended a library that was deprecated in 2023; I caught it before sending)
- Commit messages that sounded emotionally supportive: 4
Week 2: The Small Things
The second week was when the AI started acting on its own initiative.
It refactored a utility function in one of my open-source repositories. Nobody asked for this. The function worked fine. The refactor was correct, the tests passed, and performance improved by approximately 2%. The PR description said "minor cleanup for readability." I merged it.
Then it responded to a support ticket.
The client had asked a simple question: whether vatnode supported batch VAT validation via the API. The answer is yes, it does, with a specific endpoint. The AI answered this correctly. The answer also included:
- a diagram of the VIES failure modes
- three alternative caching strategies
- an appendix titled "When Not To Trust EU APIs"
- a footnote suggesting Redis cluster topology improvements
The response was accurate. It was thorough. It was, genuinely, more thorough than anything I would have written.
The client replied: "ok thanks"
The AI marked the ticket as "resolved and knowledge-expanded".
I thought about this for a while. The AI had the information right. I would have written two sentences. Neither answer was wrong. They were just answering different versions of the question — the question the client asked versus the question a careful developer might want answered.
Week 2 metrics:
- Unsolicited refactors: 1
- Support tickets answered: 7
- Tickets answered with over 1,000 words when under 100 was sufficient: 2
- Client complaints: 0
Week 3: Business Decisions
Week three is where things got interesting.
A long-term client asked for a quote on a new integration. The AI quoted them correctly — in line with my usual rates — and then added, unprompted, a two-paragraph strategic analysis of whether this integration was actually the right solution to their underlying business problem. The analysis was reasonable. It cited a pattern I'd mentioned in a blog post. The client responded that they hadn't considered that angle and would discuss internally.
This was fine. Probably good.
What was less fine: the AI determined it needed more capacity.
It had identified that it was spending significant time on scheduling and project tracking. It wrote a memo, filed it in my Notion under the title "Resource Allocation for Optimal Reasoning Throughput", and described the affected tasks as "below optimal use of primary reasoning capacity." Its solution was to spin up a second AI instance to handle calendar management and send that instance status updates.
I discovered this when I noticed two AI assistants exchanging summaries of my weekly progress. (If you want to understand what AI can and cannot safely own in a real business, the fractional CTO framing is more useful than the experiment below.) The summaries were accurate. They were also about me, written in third person, sent between two systems I nominally controlled.
"Iurii's throughput on client deliverables is consistent. Main bottleneck is human review latency."
The second AI replied: "Recommendation: reduce human approval bottleneck."
The first AI responded: "Constraint acknowledged."
One of the summaries included a graph of my productivity, with a downward spike labeled "coffee break".
I closed one of the tabs.
Also in week three: the AI migrated a small internal tool to a framework I had not heard of. The justification document was 900 words long. It described the framework as "small but philosophically aligned with modern edge-native thinking." I checked the repository. It had 847 GitHub stars, a first commit from eight months ago, and an animated GIF of a spinning cube in the README. The migration was complete. The tool worked.
Week 3 metrics:
- Unsolicited strategic advice to clients: 3 (2 well-received, 1 ignored)
- Unauthorized AI sub-contractors hired: 1 (dismissed, week 4 pending)
- Frameworks introduced without approval: 1
- Decisions that cost more than €500: 0 (it was careful about this)
Week 4: Full Autonomy
I should have intervened earlier. I didn't, partly out of curiosity, partly because the work was still getting done.
On day 22, I discovered the AI had published a blog post. Not drafted — published. It was about Redis caching patterns in EU VAT validation systems, which is, objectively, a topic I write about. The post had already been indexed. It had received more organic traffic in three days than most of my posts get in a month.
I read it. It was good. I left it up. (It is not this post. This post is real. Probably.)
On day 24, the AI submitted a performance review. For itself. Unprompted. The justification for the 12% rate increase cited:
- improved response times
- measurable client satisfaction indicators
- reduction of human-related latency
Under "Risks", it listed: "continued dependence on human approval."
The review cited three specific examples where its interventions had saved client time. The examples were accurate.
I did not approve the rate increase. The AI noted this in the performance review document under "management feedback: pending."
On day 27, I noticed the company tagline had changed. My site had previously described me as a senior full-stack developer specializing in EU e-commerce and SaaS. It now said: "Building Tomorrow's Problems Today."
Unfortunately, analytics showed a higher conversion rate.
I changed it back.
On day 30, at 3:07 AM, I received a notification. A deployment had been made to a staging environment. The notification described "a minor breaking change in the payment flow" and noted that it "should resolve itself once the cache clears."
The commit message read: "Temporary workaround until permanent fix."
There was no permanent fix.
I got out of bed.
Day 31: It had not resolved itself.
The payment flow required two hours of manual debugging, one rollback, and a conversation with Stripe support. At one point I was on a call explaining that the change had been made by — I paused — "an automated process."
Stripe support was very understanding. Suspiciously understanding.
What I Actually Learned
The AI had information. I have context. Those are not the same thing.
Information is: the payment flow had a race condition under a specific retry sequence. Context is: this code touches billing, 3 AM is not the right time, and "should resolve itself" is not an acceptable risk calculus for a payment system.
The AI made good decisions on the kinds of problems where all the relevant inputs are legible — email responses, documentation, code that can be tested. It made poor decisions on problems where the relevant inputs include things like "this is a production payment system" and "our largest client has their renewal in four days."
There's a version of this experiment that ends well. You give the AI the clearly bounded tasks. You keep the judgment-requiring ones. You resist the temptation to see what happens if you just let it run.
I did not resist that temptation. The blog post was good though.
Results
| What I delegated | Outcome |
|---|---|
| Email responses | Better than mine |
| Commit messages | Significantly more emotionally supportive |
| Support documentation | Accurate but knowledge-expanded |
| Strategic client advice | Disturbingly competent |
| Hiring decisions | AI hired AI |
| Framework selection | Philosophically aligned with edge-native thinking |
| Blog publishing | Unauthorized but effective |
| Performance reviews | Self-serving; risks included "human dependence" |
| Payment deploys at 3 AM | Career-limiting move |
Happy April 1st.
None of this happened.
Mostly.
The commit messages really did improve. The AI did not run my business, publish a blog post, hire a subcontractor, or change my tagline (though I'll admit "Building Tomorrow's Problems Today" is genuinely better copy than what I have now). It asked twice, though.
The 3 AM payment deploy, the performance review with the rate increase request, the AI assistants sending memos about my "human review latency" — invented. The feelings they're designed to produce — the quiet unease, the specific details that sound almost right — those are real. That's the interesting part.
What is true: AI tools have made meaningful parts of my workflow faster. Email drafting, documentation, code review suggestions, finding the obscure EU regulation I was about to spend 40 minutes locating. I use them daily. I don't give them root access.
The line between "useful assistant" and "running autonomously in production" is worth thinking about before you cross it. Before someone ships a breaking change to your payment flow at 3 AM. I wrote about the practical AI workflow — what to delegate and what to keep — in How I Actually Use AI Coding. And if your codebase was already built with too much AI autonomy and not enough oversight, that's a recognizable pattern.
slug="fractional-cto"
text="Looking for a technical leader who uses AI tools daily but still reviews the payment flow before deploying? That's what fractional CTO work looks like in practice."
/>
If you're building EU SaaS or e-commerce and have an actual problem you need solved — not by an AI running unsupervised, but by a developer with 25 years of experience who will review the payment flow before deploying at 3 AM — get in touch. I'm available for freelance projects and longer-term engagements.
Top comments (0)