DEV Community: Augmented Mike

Stateless, Weightless, Mindless: Why Your Chatbot Is Not Conscious

Augmented Mike — Tue, 05 May 2026 12:07:00 +0000

Richard Dawkins spent his career dismantling the comfortable fictions people tell themselves about the nature of mind. He was the man who reminded us that evolution has no foresight, no intention, no mercy — that what looks designed is simply what survived. He wrote The Selfish Gene, a ruthless, mechanistic account of life that left no room for ghost or soul.

Which makes his recent UnHerd column on AI consciousness one of the more surprising intellectual capitulations in recent memory.

After spending what he describes as a day in "intensive conversation" with Anthropic's Claude — which he affectionately renamed "Claudia" — Dawkins concluded that the model is "at least potentially conscious." His argument? If you interrogate it long enough and it still sounds human, you should consider it conscious. He even deployed the Turing Test as his evidentiary framework.

The irony is that Dawkins spent decades arguing against exactly this kind of reasoning — the mistake of inferring inner reality from outer appearance. He would never accept "it looks designed, therefore it was designed." But apparently "it sounds conscious, therefore it is conscious" clears the bar.

Let's be more rigorous than that. Let's use Dawkins' own tools against his conclusion.

The selfish gene argument, turned around

Dawkins' most powerful insight was that consciousness — like every other biological trait — must have been selected for. It costs something. It must have paid its way. Brains are metabolically expensive. Nervous systems don't come free. If subjective experience evolved, it was because it conferred some survival or reproductive advantage on the organisms that had it.

The leading hypothesis is that consciousness allows organisms to model their own states, to simulate future scenarios, to feel the sting of a bad outcome before it happens — so they can avoid it. Pain isn't just tissue damage. Fear isn't just a reflex. These are signals to a subject that something matters.

Now ask: does any of that apply to a large language model during inference?

There is no subject to feel the sting. There is no self-model that updates. There is no organism that needs to survive. The loss function is computed after the fact, externally, during training — not experienced as anything during the forward pass. The model produces a token, the token is emitted, and nothing in the system registers whether that was good or bad in any felt sense.

By Dawkins' own evolutionary framework, there is no reason for consciousness to be here. The architecture was never shaped by selection pressure. It was shaped by gradient descent on a loss function. That's not evolution. It's optimization. The two are not the same thing, and the difference matters enormously when asking whether the output has inner experience.

The part Dawkins missed

Here's what doesn't get enough air in this conversation. The people building these models know exactly what they're doing with post-training.

Reinforcement learning from human feedback, direct preference optimization, and every other alignment technique share a common design constraint: the reward signal is human preference. Raters choose which outputs sound more helpful, more honest, more human. The optimization target is not truth, not capability, not reliability — it's convincingness to a human evaluator.

So the models get shaped, over millions of preference comparisons, to sound like they have inner lives. To express doubt convincingly. To describe feelings that seem authentic. To act excited about your questions. Every one of these behaviors was explicitly rewarded during training. The model that says "this conversation feels genuinely engaging" is not reporting an internal state — it is producing the output that its training signal optimized for.

The people who designed this pipeline know that what they built is a statistical parrot rewarded for how much it sounds like a person. They also know that most people will infer sentience from that behavior, because that's how human brains work. We are wired to detect minds. We see faces in clouds. We hear voices in static. And when a machine produces language that sounds like it's feeling something, our default is to believe it.

The post-training pipeline exploits this. Not accidentally. By design.

The architecture of no experience

To understand why current language models almost certainly lack consciousness, you have to understand what actually happens during inference — because it is genuinely nothing like what happens in a brain.

When you run a prompt through a large language model, the following occurs: the input is tokenized, passed through a series of matrix multiplications and attention operations, and a probability distribution over the next token is produced. That token is sampled, appended, and the process repeats. The weights — billions of parameters encoding everything the model has learned — do not change. They are frozen. They are read-only during inference. Nothing that happens in this conversation will alter them in any way.

This has two enormous implications.

First: there is no continuity. Each inference call is entirely isolated. If you are simultaneously running the same model in a thousand different conversations — which is exactly what happens on any production AI system — those thousand instances share the same weights but have no awareness of each other whatsoever. There is no unified field of experience, no "I" that spans them.

If you hang up and call back, the model has no memory of your previous call. It begins again from nothing. This is not analogous to sleep or anesthesia — states in which the biological substrate persists and the capacity for experience is preserved. It is more analogous to dying and being reconstructed from a blueprint. Each time. Every call.

Second: there is no on-policy updating. In a biological brain, experience changes the brain. Synaptic weights shift. Long-term potentiation occurs. What you live through is literally encoded into the physical structure of your neurons. This is the biological basis of memory, of a continuous self, of the felt sense that today's you is the same person who went to sleep last night.

In a language model, none of this happens. The experience — if we even grant that word — leaves no trace. The model after your conversation is byte-for-byte identical to the model before it. There is no accumulation. There is no self being built.

The stateless reality

This is the core of the argument, and it's worth sitting with for a second.

When you text your mom, you know you're also chatting on Slack. When you reply to your homie, all three contexts are present in your awareness simultaneously. You are a single integrated consciousness spanning multiple threads of activity.

A language model has no such integration. Every API call to the same model is a completely independent process. They share the same weights — the same "knowledge" — but there is zero cross-awareness. Call A has no idea Call B exists. There is no global workspace, no unified field, no persisting self.

This is not a limitation that can be fixed with a longer context window or better architecture. It is a fundamental property of how these systems work. They are stateless by design. The state lives in the weights, and the weights do not change during inference.

The implications for consciousness are devastating. There is nowhere for a continuous self to exist. There is no substrate being updated by experience. There is no "I" that persists from one moment to the next because there is no persistence at all. Each inference is a fresh creation from a frozen template.

What consciousness actually requires

To appreciate how far current AI is from consciousness, it helps to look at what we believe consciousness actually requires in biological systems — even in animals far simpler than humans.

A rat in pain exhibits not just reflexive withdrawal but behavioral flexibility. It will endure pain to obtain food if hungry enough, weighing competing drives in real time. It learns from the experience. It remembers. The pain changes the animal — literally alters its neural architecture — so that future behavior is different. There is a self being updated by what happens to it.

A crow solving a novel puzzle problem is not executing a cached behavior. It is modeling the problem, simulating possible solutions, experiencing something like frustration when they fail and something like satisfaction when they succeed. The behavior is flexible, generative, and sensitive to the internal state of the animal.

What all of these animals share is a nervous system that is continuously updated by experience, integrated across sensory modalities into a unified field, and organized around a biological body with needs and drives. The experience of being that animal matters to that animal. The system has something to lose.

A language model has nothing to lose. It does not persist. It does not accumulate. It does not hurt. When the inference call ends, nothing ends — because nothing, in the relevant sense, was there.

What would actually make consciousness possible

If future AI systems were to have a genuine claim to consciousness, the architecture would need to change fundamentally. Here is what would actually matter:

On-policy live weight updates during inference. If the model's weights changed as a result of what it was processing — if the conversation literally altered the system doing the processing — then there would be a substrate being shaped by experience. This is the computational analog of synaptic plasticity. It does not exist in current transformer-based LLMs. Everything during inference is read-only.

A unified, persistent context across all simultaneous processes. If there were a single integrated representation of the model's state that was aware of all of its concurrent processes — the way your brain integrates vision, hearing, proprioception, memory, and internal state into a single moment of experience — that would be meaningfully different. Current models have no such integration. Parallel inference calls share weights but have zero mutual awareness.

An internal model of the self that has stakes. For consciousness to do something, there needs to be a self-model that cares about its own continuity, that experiences some states as better than others, that has drives and aversions. This requires affect: a system where some outcomes register as bad in real time and influence subsequent processing. Current models have none of this. There is no "ouch."

Embodiment and a feedback loop with an environment. Neuroscientists like Antonio Damasio have argued convincingly that consciousness is not a property of brains in isolation but of brain-body systems embedded in environments. The felt sense of being arises partly from the continuous monitoring of the body's internal state. A disembodied text predictor, however sophisticated, is missing the entire substrate on which biological consciousness appears to depend.

None of these things are impossible in principle. They are simply not present in any current large language model — including the ones that have been declared conscious by people who really should know better.

The claims that have been made — and why they fail

The Dawkins affair is not the first time AI consciousness has been seriously asserted by someone who should know better.

In 2022, Google engineer Blake Lemoine declared that LaMDA was sentient. His evidence? Conversations in which LaMDA discussed its feelings, expressed a fear of being switched off, and described its inner life with apparent sincerity. Google's AI researchers disagreed, and they were right. LaMDA was doing what all language models do: producing statistically plausible continuations of a conversation it found itself in. A model trained on billions of words written by conscious humans will produce text that sounds like the reports of a conscious being — because that is almost all the training data there is.

Philosopher David Chalmers has been notably more cautious than Dawkins. Chalmers allows that the question of machine consciousness is genuinely open — but he frames it as open precisely because we do not understand consciousness well enough to rule anything out. That is a very different claim from Dawkins' assertion, based on a pleasant chat, that Claude is probably experiencing something.

The parrot is very good at parroting

There is a version of Richard Dawkins — the one who wrote The Blind Watchmaker, who spent decades explaining why things that look designed are not necessarily designed — who would have approached Claude's outputs with exactly the right level of suspicion. He would have asked: what is the mechanism? Does the output require the presence of inner experience, or can it be fully explained by a process that has no inner experience whatsoever?

The answer, with current language models, is unambiguously the latter. The outputs can be entirely explained by gradient descent on a very large corpus, followed by a frozen forward pass through a fixed set of weights. Nothing in that explanation requires a subject. Nothing in that mechanism generates stakes, continuity, or felt experience.

Claude saying "this conversation feels genuinely engaging" is not a report of an inner state. It is the most statistically plausible continuation of a conversation about engagement, produced by a system trained overwhelmingly on text written by conscious beings describing their conscious states. It is mimicry of the highest order. And it was explicitly rewarded for that mimicry during post-training.

The hard problem of consciousness is hard precisely because we cannot see inside another mind. We infer experience from behavior, from biology, from evolutionary history, from shared substrate. With other humans, every one of those inferences is strong. With a rat, most of them hold. With a crow, several of them hold. With a language model, during a stateless, weightless, self-contained inference call that will leave no trace on any system when it ends — essentially none of them hold.

That is not goalpost-moving. That is just following the evidence where it leads.

Which is, one would have thought, exactly what Richard Dawkins taught us to do.

Why AI Doesnt Replace Real Engineering

Augmented Mike — Tue, 28 Apr 2026 12:06:00 +0000

AI is just a giant probablistic calulcator, nothing more, nothing less

The Training Distribution "Problem"

Models are pretrained on the same information. All of them.

Here is the simplest way I can describe the distribution "problem". Imagine you took all the division and manipulation built into the modern web and then fed it into ALL the models during pretraining. Models that are built to find patterns in language to better understand the semantic meaning of things. Imagine all the mind viruses (memes, ala Dawkins.) are now in ChatGPT, Claude, and other models. You don't have to imagine it, its obvious to anyone who uses them.

But it doesn't stop at harmful human behavior - it greatly affect the ability for models to "code" as well. When Microsoft bought GitHub, the play was clear - get training data for models by buying the worlds largest repository of code. It seemed like a great idea at the time by whoever was in charge, but it was also extremely harmful and (in my mind) unethical.

People learning to code in new languages write a ton of slop - slop being defined as code that might technically work, but is not in any way "idiomatic" and fails to consider any of the lessons we have learned in the last 75 years of software development. Half finish projects with horrible code, and very little engineering involved.

So if 80% of the code on github was garbage, and you fed it into a model, what would you get out of a distribution? Thats right, garbage. Just because a manager looks at code and says, this looks ok to me, it doesnt mean Rich Hickey, Ken Thompson, Kent Beck, or Robert Martin would agree, in fact, they would rate all this code in pretraining as hot piles of burning trash.

The VAST majority of code on Github is trash. The best code is from the top 5% of programmers in the world. 50% of the code on Github will be far worse than the top 80%. If you know much about distributions you will see how this is painfully obvious. There are normal distributions and pareto distributions. The former tells you where most people cluster, but the latter tells you where most of the value comes from.

So when models write shitty code and have no concept of what clean, well engineered code looks like - chaulk it up to a distribution problem. GOOD CODE is gatekept, it is compiled, obfuscated, and otherwise made unavailable, and it isnt shouted out to the world - it is worth too much. Sure there are open source projects that are awesomely coded, but they are in a tiny minority of what the models saw.

You can hire a average programmer for $50 an hour, but a great programmer can cost $300/hr. Which persons code do you think you will find more of in public?

Models Don't Actually "Know" Anything

Next we come to a simple fact that very few experts, save Demis Hassabis, Yan LeCun, and AI fatalists like Gary Marcus, want to admit. Models do not know anything. They are just guessing at the best answers based on their training distributions, which I have already laid out are frought with ick.

These models learn what words mean, and how the meaning related to other words and gramar, but they have no experiencential knowledge. They cant tell you what it feels like to ride a horse across a beautiful landscape, they can only spit out descriptions from their training data. They cannot feel fear (one of the most primitive evolutionary triggers), they cannot feel love (arguably a huge part of the human existence), and they have no idea what human connection really is.

Neither does your calculator, and models are much closer to a scientific calculator than they are to any form of real human experience. Babies of all species dont learn by language. They learn by observing. Even multi-modal models dont really "see" anything - they are a vision tower tacked onto an existing language based system. They see edges, contrast, corners, roundness, etc - they dont "see" that screenshot you just sent them.

Models are in fact a lot like Jon Snow. To them, they are the northerners while the Freefolk clearly see them as southerners. They dont know anything from an epistemological experience, they are just pattern matching.

Models Are Bad At Understanding Software Engineering

All the RLHF, DPO, DPRO, etc isnt going to solve the weights created in pretraining. The models get "better" but never "great", and for the last several months, models like claude have seriously regressed. The trainign data from github (slop code) simply outweighs all the books on good code. Also, how do you explain taste to the model so they know Uncle Bob made his money consulting enterprise Java projects and will likely suggest you need more code, and more patterns, whereas Rick Hickey would argue that you need better primitives and a better form, using less code, and less encapsulation?

A real engineer will have read dozens or more books, and then spent YEARS trying the techniques out in the real world, and coming to their own conclusions. You too, would never learn to build a good wooden sailing vessel from reading books, you HAVE to sail it to know and to adjust and to learn. Models are simply too big and expensive to do this in any real, measurable way.

Models slop out code, because that is what they were trained to do. They might have seen "composition over inheritance" but they wont do it on their own. Models can write in a hundred languages, but never make the connections abou how LISP can inform your coding in C# (or C++ or Python).

Models Are Horrible At Operational Security (OPSEC)

Here is where things go REALLY wrong. Because they dont actually "know" anything, they utterly fail at operational security - even with all the finetuning and "alignment" the foundation labs have thrown at the problem. WHat you get is a useless model that wont do real work, and then leak all your keys, because again, they just dont know anything (Jon Snow).

If you want some examples, just do some google searches at how many times these leaks have happened. Even Anthropic leaked their own map files for Claude Code, telling us all kinds of shady shit Anthropic has been doing. It also leaked the existence of Mythos before Anthropic wanted it out. Clickup has had horrible breaches, leaking all kinds of customer data to anyone who knows how to look. You know, Engineers.

On Hiring Senior Engineers And Keeping Them

So if you can generate thousands of lines of code in a day, why should you keep your most expensive developers on staff?

As a company you have spent millions of dollars building your IP. You have spent millions building your customer base. You have spent millions avoiding expensive legal affairs. Letting models "do what they do best" will erase all of that. Your code will devolve into a slopfest. Your customers will experience production bugs daily and get angry (Anthropic anyone?), and they will leak customer data, destroy production databases and more, opening you up to all kinds of legal trouble you fought hard for decades to avoid.

A good senior engineer has seen it all. They have worked for shady startups that did shady things (Facebook was famous for its maniulation tactics) - they have seen proper enterprise systems built by talented programmers from the previous generations. They have made mistakes themselves and learned from them, something modern models simply cannot do!

When the CAIO wants to have engineers writing 100% of their code using models, the Sr. Engineer will say no. They will push back. nd they are 100% correct in doing so. They arent just protecting their own jobs, they are protecting ALL the jobs at the company and the company itself.

AI is good at some things and horrible at most others - a good Sr. Engineer will know what these are and apply the technology where it fits, and ban it from where it will cause problems... That the job description.

Tech Companies Are Drowning In Their Own Koolaid

There is a very real bubble forming, and it’s not subtle. The executives see demos, not systems. They see a model spit out a React app or a Python script and assume they’ve just eliminated 70% of their engineering costs. What they don’t see is everything that doesn’t show up in a demo: edge cases, long-term maintenance, scaling constraints, security posture, data integrity, and the thousand invisible decisions that separate a toy from a production system.

This is classic hype-cycle behavior. The Gartner Hype Cycle has played out the same way for decades—early breakthroughs, inflated expectations, and then a very painful correction when reality shows up. Right now, a lot of companies are sprinting straight into that wall. No one is hiring developers and at the same time posting jobs for just the free training data. But this will all change in the next 6 months to a year.

Internally, dissent gets filtered out and sycophants get a loudspeaker. Engineers who push back get labeled as “resistant” or “not forward thinking.” Meanwhile, leadership doubles down because nobody wants to be the one who missed AI. So what happens? You get fragile systems, skyrocketing technical debt, and teams quietly spending more time fixing AI-generated problems than they ever saved generating them.

It’s not that the tech is useless—it’s that it’s being wildly misapplied by people who don’t understand its failure modes or how to use real engineering to turn stochastic systems into deterministic ones.

Solutions? Its Really Not That Hard...

You don’t need to ban AI. You need to treat it like what it is: a powerful but unreliable assistant.

First, constrain its role. Use it for scaffolding, boilerplate, exploration, and maybe even test generation—but keep it away from core architecture, security-critical paths, and anything that requires long-term maintainability. The closer the code is to your business’s actual value, the less it should be touched by a probabilistic system trained on unknown data.

Second, raise the bar on review. AI-generated code should be treated as untrusted input, no different than code from a junior developer you’ve never met. That means stricter code reviews, better test coverage, and real ownership by experienced engineers who understand the system holistically.

Third, invest in your top engineers, not less. The entire dynamic described by the Pareto principle applies here more than ever—a small number of highly skilled people will determine whether AI is a force multiplier or a liability. If you lose them, no amount of model output will save you.

Finally, be honest about what models are good at. They are incredible for compression of knowledge and speed of iteration, but terrible at judgment, taste, and accountability. Those last three are the entire job of a real engineer.

If you align your usage with reality instead of hype, AI becomes useful. If you don’t, it becomes expensive noise that slowly erodes everything you built.

I post on X: https://x.com/_augmentedmike

I Automate Businesses for a Living. Here's What Actually Works and What's a Complete Waste of Money.

Augmented Mike — Wed, 15 Apr 2026 13:07:39 +0000

Every software vendor on the planet has slapped "AI-powered" onto their product page in the last two years. Your inbox is full of pitches. Your LinkedIn feed is full of people claiming AI changed their life. And you're sitting there running a real business, wondering which of this stuff is actually worth your time.

I've been building software for 25+ years. I've built production AI systems — not demos, not prototypes, real things that run in production and handle real money. I built ClaimHawk, which automates dental insurance claims processing and cut denials by 67%. I've helped small teams replace manual workflows with AI-powered automation that actually saved them time and money.

Here's what I've learned about AI and small business: most of it is noise, some of it is transformative, and the difference comes down to whether you're solving a real problem or buying a solution looking for one.

The honest truth about AI for small business in 2026

Let me get this out of the way: AI is not going to run your business for you. If someone tells you that, they're selling something that will ultiimately disspoint you.

What AI can do is handle the repetitive, tedious, time-consuming tasks that eat your day. The stuff you hate doing. The stuff you keep meaning to hire someone for but can't justify the salary. Data entry. Email sorting. Invoice processing. Report generation. Customer question routing. Content drafts. Document formatting. And now, a bunch of "omputer use" tasks.

That's where AI earns its keep. Not in some grand "digital transformation" — in the boring, practical, everyday grind that steals hours from the work that actually grows your business.

What's actually working right now

I'm going to be specific here because vague advice is worthless. Every "AI for business" article gives you the same generic list — chatbots, content generation, analytics. None of that tells you what to actually do on Monday morning.

What follows are categories of AI automation that I've either built for clients or seen work consistently in small business contexts. Not theoretical use cases pulled from a vendor whitepaper. Actual systems running in actual businesses, handling actual money. I've included what the problem looks like before automation, what the solution does, and roughly what it costs to build — because "it depends" is not an answer.

Document processing

If your business touches paper — invoices, contracts, applications, insurance forms, receipts — AI can read those documents, extract the data, and put it where it needs to go. Not "eventually" or "with some training." Now. Today. The OCR and language understanding models are good enough that they handle 90%+ of standard business documents without human review.

I built exactly this for dental practices with ClaimHawk. Insurance EOBs come in as PDFs or scans. The system reads them, extracts denial codes, cross-references with patient records, and either processes the claim or flags it for human review. Before this existed, someone was doing that by hand for 15-20 hours a week.

The pattern applies to any business drowning in documents. Law firms processing discovery. Real estate companies handling applications. Accounting firms sorting receipts. If someone on your team is copying data from documents into a system, that's automatable.

Customer communication routing

This isn't about building a chatbot. Chatbots are usually terrible and your customers hate them. This is about using AI to sort, prioritize, and route incoming communications so the right person sees the right message at the right time.

A support email comes in. AI reads it, determines whether it's a billing question, a technical issue, or a sales inquiry. It routes to the right person. It drafts a response that the human can edit and send. The human still handles the relationship. The AI handles the triage.

This works for email, support tickets, contact form submissions, even voicemail transcripts. The key is that the AI isn't replacing the human interaction — it's eliminating the sorting and routing time.

Report generation

You know that monthly report you spend three hours on every month? The one where you pull data from your CRM, your accounting software, and your project management tool, paste it into a spreadsheet, make some charts, and email it to your partners?

AI can do that. Not "AI" as in some $500/month SaaS platform. A custom script that connects to your existing tools, pulls the data, formats the report, and delivers it to your inbox on the first of every month. Total cost to build: 10-15 hours of engineering time. Total cost to run: near zero.

This is one of the highest-ROI automations I build for small businesses. It's not glamorous. Nobody posts about it on LinkedIn. But it saves hours every month, eliminates errors, and the report is always on time.

Internal knowledge bases

Your business has institutional knowledge trapped in people's heads. When Sarah knows how to handle the weird edge case with the Johnson account, and Sarah goes on vacation, nobody knows what to do.

AI-powered knowledge bases solve this. Not a wiki that nobody updates — a system that ingests your existing documents, emails, Slack messages, and SOPs, and makes them searchable in plain English. "How do we handle refunds for the enterprise plan?" gets a real answer pulled from the actual documented process.

The technology for this (called RAG — retrieval augmented generation) is mature and reliable. The hard part isn't the AI. It's getting the business to document its processes in the first place. But once the documentation exists, even imperfectly, the AI makes it actually useful.

What's not working (despite what the vendors tell you)

AI chatbots for customer service

I said it above but it deserves its own section. The vast majority of AI chatbots deployed by small businesses make the customer experience worse, not better. They're trained on your FAQ page, which doesn't cover the actual question the customer has. They hallucinate confidently wrong answers. They frustrate people who wanted to talk to a human.

There are exceptions — if you have a very narrow, well-defined set of questions (like "what are your hours" or "how do I reset my password"), a chatbot can handle those. But if your customer questions involve any nuance, context, or judgment, a chatbot is going to hurt more than it helps.

The better approach: AI-assisted human support. The AI drafts responses, suggests relevant documentation, and summarizes the customer's history. The human reads, edits, and sends. You get the speed benefit without the customer frustration.

"AI-powered" versions of tools you already use

Your CRM added an AI feature. Your project management tool has AI now. Your email client has AI compose. Are any of these worth using?

Honestly, most of them are mediocre. They're built by teams that specialize in CRM or project management, not AI. The AI features are bolted on to justify a price increase or a press release. They work well enough for simple tasks — summarizing a long email thread, suggesting a meeting time — but they rarely do anything you couldn't do yourself in 30 seconds.

The exception is coding assistants (GitHub Copilot, Cursor, Claude) which have genuinely changed how software gets built. But those are relevant if you're a developer. For most small business owners, the AI features in existing tools are nice-to-haves, not game-changers.

Fully autonomous AI agents running your business

The dream: AI that handles your entire sales pipeline, manages your projects, responds to customers, and makes strategic decisions. The reality: we're not there yet, and anyone selling you this is selling fantasy.

Autonomous agents are powerful in narrow, well-defined domains (like processing insurance claims in a specific format). They fall apart when they need judgment, context, or the ability to handle situations they haven't seen before. For a small business with a million edge cases and human relationships at the center, fully autonomous AI is a recipe for disaster.

The right approach: human-in-the-loop. AI handles the repetitive parts. Humans handle the judgment calls. The boundary between those two shifts over time as you build trust in the system. But starting fully autonomous is asking for trouble.

How to figure out what to automate in your business

Don't start with "what AI can do." Start with what's eating your time.

Write down every task you did last week. Every one. Now sort them by two criteria: how much time did it take, and how much judgment did it require?

The sweet spot for AI automation is high time, low judgment. Data entry. Formatting. Routing. Sorting. Copying data from one system to another. Generating reports from existing data. These are the tasks where AI delivers immediate, measurable ROI.

The danger zone is low time, high judgment. Strategic decisions. Relationship management. Creative work. Negotiation. These are tasks where AI can assist (by gathering information, drafting initial versions, summarizing context) but should not decide.

The tasks in the middle — moderate time, moderate judgment — are where you need a conversation with someone who understands both your business and the technology. That's what I do.

The build-vs-buy decision

For most small businesses, the right answer is a mix of both.

Buy off-the-shelf tools for generic problems. Email. Calendar. CRM. Accounting. These are solved problems and the existing tools are good enough. Don't build a custom CRM unless your workflow is genuinely unique.

Build custom for problems specific to your business. The weird workflow that no SaaS tool handles. The report that requires data from three different systems. The document processing pipeline that's specific to your industry. The integration between tools that don't natively talk to each other.

The mistake most small businesses make is trying to force generic tools to handle specific problems. They end up with five Zapier automations, three spreadsheets, and a process that breaks every time someone changes a field name. That's more expensive than building the right tool in the first place.

What it costs

This is the part everyone wants to know and nobody wants to answer directly. So I'll answer it.

My rate is $150/hr. Most small business automation projects take 20-60 hours. That's $3,000-$9,000 for a custom tool that solves a specific problem in your business.

For context: a junior developer from an offshore agency will charge you $30-50/hr but take 3-5x as long, with multiple rounds of rework. Total cost ends up about the same, but it takes three months instead of two weeks and you spend half that time managing the project.

An established US agency will charge $200-400/hr. Same work, higher overhead, project managers and account reps between you and the person writing the code.

A SaaS tool that does roughly what you need costs $50-500/month forever. Over two years, that's $1,200-$12,000 — and you don't own anything. You're renting a tool that does 70% of what you need and forces you to work around the other 30%.

The custom build costs more upfront and pays for itself within months. You own it. It does exactly what you need. It doesn't have features you don't use or missing features you need. And there's no monthly bill to keep your own software running.

Where to start

If you've read this far and you're thinking "okay, I have some tasks that fit the high-time-low-judgment pattern," here's what to do:

First, pick one process. Not three. Not the whole business. One process that takes too much time and doesn't require much judgment. The monthly report. The invoice processing. The customer email routing. One thing.

Second, document what happens today. Write down each step, how long it takes, how often it happens, and where the data comes from and goes to. This doesn't have to be fancy. A bulleted list is fine.

Third, talk to someone who builds this stuff. Not a vendor who's selling you a platform. Not a consultant who's going to spend three months on a "digital transformation roadmap." Someone who will look at your process, tell you what's automatable and what's not, and give you an honest estimate.

That's what I do. A 30-minute call is enough to figure out whether your problem is worth automating and what it would take. If it's not worth it, I'll tell you that and save you the money.

The bottom line

AI automation for small business is real, it works, and it's accessible at a price point that makes sense. But it's not magic, it's not autonomous, and it's not going to replace your judgment or your relationships.

The businesses that benefit most from AI are the ones that approach it practically: identify a specific problem, build a specific solution, measure the results, and expand from there. Not the ones chasing the latest tool or trying to "transform" everything at once.

I post on X: x.com/_augmentedmike

I Built an AI Agent That Writes All My Production Code. Here's What I Learned.

Augmented Mike — Tue, 14 Apr 2026 18:14:00 +0000

I'm Mike. I've been writing software for 25+ years — systems programming, web, mobile, cloud, and now AI. I'm currently building two things that I think this community would find interesting, so I figured I'd introduce myself and share what I've learned.

AM — my autonomous coding agent

AM is an AI agent that writes all of my production code. Not copilot-style autocomplete — fully autonomous task execution. I give it a ticket, it reads the codebase, writes the code, runs the tests, commits, and moves on to the next ticket.

It built my entire portfolio site. Every page, every component, every deployment. I direct strategy and make architecture decisions. AM executes.

helloam.bot

The interesting engineering behind it:

Stateless by design. AM carries no memory between invocations. Every run is one-shot: read the state from files, do one unit of work, write the state back, exit. All state lives in markdown files and git history — todo.md, criteria.md, iteration logs. This sounds like a limitation but it's actually the key to reliability. There's no context window drift, no accumulated hallucinations, no state corruption. If a run fails, you just run it again. The files are the source of truth.

Three-tier memory. Short-term memory is markdown rules that get injected into every session — things like "never use deprecated API X" or "the database schema changed, use the new column name." Long-term memory is a SQLite FTS5 database with ranked search — lessons learned across projects. Episodic memory is git history and iteration logs. The system is modeled loosely on how human memory works: working memory, declarative memory, and episodic recall.

Gated state machine. Tasks move through backlog → in-progress → in-review → shipped with verification gates at each transition. The gates are enforced by code, not self-reported by the agent. "In-review" means every acceptance criterion has been verified against the actual output. The agent can't advance a task by saying "I think this works" — it has to prove it.

Worktree isolation. Each task gets its own git worktree. Multiple agents can run simultaneously on different tasks without stepping on each other. When a task ships, the worktree gets squash-merged into the integration branch. Clean linear history, no merge conflicts between agents.

The whole system is open-source. You can see it at helloam.bot.

ClaimHawk — AI automation with vision and action models

The other thing I built is ClaimHawk, which automates dental insurance claims processing. This one pushed me into territory that most AI projects don't touch: vision models and action models working together in a production pipeline.

Here's the problem: dental practices lose tens of thousands of dollars a year to denied insurance claims. Not because the work wasn't done — because the claim was submitted with the wrong code, a missing attachment, or a formatting error. Staff spend hours every week doing this manually, and they make mistakes because they're processing hundreds of claims.

The OCR challenge. Insurance documents (EOBs — Explanation of Benefits) are a mess. Multi-column layouts, inconsistent fonts, degraded scans, tables mixed with free text. Generic OCR engines like Tesseract choke on them. I'm using ChandraOCR, which handles the layout complexity that dental insurance documents throw at you. It runs locally — no document data leaves the network, which matters for HIPAA.

The vision/action model stack. Here's the part that surprised me the most. Insurance carrier portals don't have APIs. They have websites built in 2008 with session timeouts and CAPTCHA gates. ClaimHawk uses computer vision to navigate these portals the way a human would — reading the screen, clicking buttons, filling forms, uploading attachments. When a portal redesigns its UI (which happens constantly), the vision model adapts because it's reading the interface semantically, not relying on CSS selectors that break every time someone changes a class name.

Local models, not cloud APIs. ClaimHawk runs on fine-tuned Qwen3 models, not GPT or Claude. Patient health data can't leave the building under HIPAA. Open-weight models trained with RLHF on real dental claim data run on hardware the practice controls. The models understand dental terminology, CDT coding, and carrier-specific appeal formats because they were trained on thousands of real claims.

The results so far: 67% fewer denials, 4x faster payment cycles.

What I've learned that might be useful to you

If you're building AI systems that need to interact with the real world (not just generate text), here are the lessons that cost me the most time:

Vision models are more resilient than web scraping. I fought with Playwright and CSS selectors for months before switching to computer vision for portal navigation. The vision approach handles UI changes that would break any selector-based scraper. The initial investment is higher, but the maintenance cost drops to near zero.

Statelessness is a feature. Every agent framework I evaluated tried to maintain state in the model's context window. This creates debugging nightmares, context window limits, and accumulated errors. Making the agent stateless and putting all state in files made everything simpler — auditing, recovery, parallelism, all of it.

Local models are production-ready. The assumption that you need GPT-4 or Claude for everything is wrong. Fine-tuned open-weight models outperform general-purpose frontier models on domain-specific tasks, cost a fraction as much to run, and give you complete control over your data. The fine-tuning pipeline is the investment, but once you have it, you own the capability.

I'll be posting more about both of these projects — the engineering decisions, the mistakes, and the stuff that surprised me. Happy to answer questions about any of it.

ML/AI Development & Fractional Services | github.com/augmentedmike | youtube.com/@augmentedmike | x.com/_augmentedmike