DEV Community: Denis Moroz

What's Actually Happening in AI Right Now (Explained Like I'm Talking to a Friend)

Denis Moroz — Sat, 02 May 2026 11:11:52 +0000

You've seen the headlines. "AI breaks record." "New model released." "Safety protocol triggered." It's a lot. Most of it is written for people who already know what they're reading about.

I'm going to fix that. Here are the three biggest things happening in AI right now, explained the way I'd explain them to a friend over coffee.

1. Anthropic Built Something So Powerful They Decided Not to Release It

This is the one that stopped me in my tracks.

Anthropic — the company behind Claude, one of the most capable AI assistants out there — confirmed they built a new model called Claude Mythos 5. It's the first AI model to cross the 10-trillion-parameter mark, which is a way of saying it's genuinely enormous in terms of complexity.

And then they didn't release it.

Why? Because it triggered their internal ASL-4 safety protocol. ASL-4 is a classification Anthropic uses for models that are approaching capabilities they consider genuinely dangerous — not "it might write mean emails" dangerous, but "this could contribute to mass-casualty-level events in the wrong hands" dangerous.

Here's what's remarkable about this: a company voluntarily shelved a product they spent probably hundreds of millions of dollars building because their own safety red lines were met.

You can read that two ways. Cynically: it's a PR move — they get credit for being responsible while staying competitive. Generously: this is exactly how you'd want a powerful AI company to behave.

I lean generous, but I'm watching closely. The fact that this conversation is happening at all tells you we're entering genuinely new territory.

What this means for you: Nothing changes in your day-to-day AI use. The Claude you use (including Claude.ai and apps built on it) is a different model. But the fact that a major AI lab hit a self-imposed safety ceiling is worth knowing. It sets a precedent.

2. GPT-5.4 Is Out, and It's the Most Capable Public AI Model I've Ever Used

On the other end of the spectrum: OpenAI shipped GPT-5.4 in March, and it's the real deal.

Previous AI models were specialists. You'd use one for coding, another for writing, another for research. GPT-5.4 is the first public model that leads across all those categories at once — coding, reasoning, writing, knowing things, using your computer. One model, no tradeoffs.

The "Thinking" version of GPT-5.4 scored 75% on a benchmark called OSWorld-Verified, which tests how well AI can complete real desktop tasks (booking a flight, editing a spreadsheet, that kind of thing). That's a 28-point jump over the previous version and better than most humans score on the same test.

I've been using it. The honest take: it's noticeably better at staying on task for complex, multi-step things. It's less likely to hallucinate in ways that feel plausible but are wrong. And it's faster than I expected.

What this means for you: If you're using any AI assistant for work, now is a good time to test GPT-5.4 if you haven't. Whether it's worth upgrading your subscription depends on what you're using AI for — but for anything involving reasoning or multi-step tasks, it's a meaningful upgrade.

3. Someone Figured Out How to Make AI Use 100x Less Energy

This one doesn't have a brand name attached to it, which is probably why you haven't heard about it. But it matters.

A research team published a paper showing that combining neural networks (the math-heavy approach most modern AI uses) with old-school symbolic reasoning (basically, logic rules that humans write) can slash AI's energy consumption by up to 100 times while actually improving accuracy.

To put that in perspective: AI training currently consumes roughly the same electricity as small countries. Data centers running AI are one of the fastest-growing sources of electricity demand worldwide. If this approach scales — and that's still an if — it could reshape the economics and environmental footprint of the entire industry.

This isn't a product. It's a research result. It'll take years to show up in things you use. But it's the kind of foundational shift that looks obvious in retrospect.

What this means for you: Nothing immediately. But if you care about AI being sustainable long-term (and you should, because runaway energy costs put a ceiling on how far this technology can go), this is early good news.

The One-Sentence Summary

April 2026 in AI: one company built something too powerful to release, one company released the most powerful public model ever, and researchers found a way to make all of it a lot cheaper to run.

That's it for this week. If you found this useful, forward it to one person who keeps asking you what's going on in AI. That's the whole goal here — making this stuff make sense.

Next up: the AI tools I actually use every week (and the ones I've deleted). Dropping in a few days.

Tags: AI news, AI explained, ChatGPT, Claude, Anthropic, OpenAI, non-technical AI

What the Latest AI Release Actually Means for You

Denis Moroz — Sat, 02 May 2026 11:11:16 +0000

There's an AI announcement almost every week now. New model, new benchmark, new capability that sounds transformational in the press release and lands somewhere between "genuinely useful" and "interesting but not for me" in real life.

This week's release worth paying attention to: Claude Opus 4.7 from Anthropic — a frontier model aimed at the top of the reasoning capability ladder. Here's what it actually means, without the jargon.

What Was Announced

Anthropic released Opus 4.7, the latest in their flagship model line. The headline claim: significantly improved reasoning on hard, multi-step problems — the kind where you have to hold a lot of variables in mind at once before arriving at an answer.

Benchmark numbers have it near the top of the field. Coding tasks, complex writing, document analysis with long contexts, logical reasoning. That's the "what."

What This Actually Means

For most people using AI tools day to day: modest improvement on hard things.

If you use AI for relatively straightforward tasks — drafting emails, summarizing documents, answering questions — you probably won't notice a dramatic difference. These models were already very good at those tasks. Better reasoning helps at the margins, but the ceiling on those tasks was already high.

Where you'll actually notice it: problems that have previously frustrated you with AI. A legal document you needed help parsing but the AI kept losing track of the argument structure. A complex data analysis where it would arrive at a reasonable-sounding but wrong conclusion. Code that spans multiple files and requires understanding how the pieces connect. Those tasks get meaningfully better.

For developers and teams using the API: more capable, more expensive.

Frontier models cost more to run. That's not a criticism — the capability justification is real — but it means the economics of AI at scale get recalculated with every new release. Teams that built products on cheaper models will face a choice: upgrade and pay more per query, or stay on the older model and accept the capability gap. Neither option is wrong, but both require a real decision.

For the "AI is overrated" crowd: the capability ceiling keeps rising.

The thing I've noticed covering AI releases is that each new frontier model settles into "the baseline" within 6–12 months. What feels like a remarkable capability today becomes an assumption people make about AI tools in general, and then they want more. That's not a bad thing — it's just worth knowing that "current AI isn't that impressive" is a statement that gets less true with each cycle.

The Part Most Coverage Gets Wrong

Every AI release is described as a step-change. Few of them actually are for most users.

The honest read on Opus 4.7: it's a meaningful improvement for power users and applications where reasoning depth matters. It's not a transformation in how most people experience AI tools, because most people use AI for tasks that were already well within reach of the previous generation.

The pattern I keep seeing: researchers and engineers notice the improvement immediately because they push models hard on difficult tasks. Casual users often don't notice because they're using AI in ways that don't stress-test the difference.

So what? If you're evaluating whether to upgrade, test it on the specific tasks that have frustrated you before — not the tasks that were already easy. That's where the new capability shows up.

The One Thing to Watch

The more interesting signal in the Opus 4.7 release isn't the benchmark numbers — it's the continued improvement in how long and reliably these models can maintain a complex conversation or task.

Context handling is the quiet upgrade that matters more than most people track. A model that can hold a longer, more coherent thread of reasoning without drifting is practically useful in ways that don't show up in headlines. Document review, long research tasks, extended coding sessions — these all get better when the model can hold more without losing the thread.

Watch for that. It compounds.

Short Version

Opus 4.7 is real and represents genuine progress at the high end of AI reasoning. For everyday AI use, the change is incremental. For complex, multi-step, or high-stakes tasks, it's a meaningful step up. Test it on your hard problems, not your easy ones.

And next week there will be another announcement. This is just the pace we're at now.

Next week: the AI productivity stack that actually fits a busy life — workflow, not wishlist.

The CTO Who Codes

Denis Moroz — Sat, 02 May 2026 11:10:40 +0000

At some point in every engineering leader's career, someone tells them: "You should stop coding. That's not your job anymore."

It's well-intentioned advice. It usually comes from people who watched a VP of Engineering ship a feature and break the sprint by pulling the team into code review politics. The advice is right about the failure mode. It's wrong about the cure.

What "Coding" Actually Means at a Leadership Level

When I say I still code, I don't mean I'm writing 500-line PRs that block my team. I mean I maintain the ability to:

Read any diff and understand its consequences
Prototype a new architecture in a weekend to validate a decision
Debug a production incident without needing four people to translate
Judge technical tradeoffs from first principles, not just from someone else's summary

This is different from being a bottleneck. It's maintaining technical fluency — the same way a CFO who used to be an auditor still does their own taxes.

The Cost of Abstraction

When a leader loses technical fluency, they start making decisions by proxy. They ask their engineers for estimates and multiply by 1.5x. They trust architecture diagrams without questioning the load assumptions baked into them. They approve a rewrite because two senior engineers recommended it.

Sometimes this works. Often it doesn't.

The deepest technical debt I've seen in companies wasn't in the codebase — it was in leadership's understanding of the codebase. When that debt gets called in (a production incident, a failed deadline, a surprise scope expansion), the leader who can't read the code has no ground to stand on.

The Right Division of Labor

The question isn't "should the CTO code?" The question is "what's the highest-leverage use of a CTO's time?"

That answer changes constantly. In week one of a startup, the CTO probably should be writing most of the backend. In week 100, they probably shouldn't be owning any production path. But the regression to never coding is a mistake.

Some of the most valuable coding I do today:

Greenfield prototypes: When I need to validate a product direction without committing engineering capacity
Tooling and infrastructure: Internal tools that my team will use but aren't on anyone's roadmap
Emergency debugging: When the on-call engineer is stuck and I can cut through faster
Code review on critical paths: Not as a gatekeeper, but as a second set of eyes on irreversible decisions

What You Lose When You Stop

Muscle memory for debugging atrophies quickly. After six months away from a codebase, I find myself googling syntax I used to type from memory. That's fine — I can relearn it. What's harder to rebuild is the intuition for where complexity hides.

Every codebase has a shape. Senior engineers know where to look when something goes wrong. That knowledge comes from living in the code, not from reading architecture documents.

Leaders who stay technical don't just maintain their own fluency — they maintain their credibility with the engineering team. You can't bullshit engineers about complexity for long if they know you shipped a production service last quarter.

The Counter-Argument

The strongest argument against CTOs coding is opportunity cost. Every hour I spend on a PR is an hour I'm not spending on hiring, strategy, or unblocking my team.

This is a real tension. I solve it by being explicit about what kind of coding counts as leadership-leverage vs. individual-contributor work. Shipping product features is usually not the right use of my time. Maintaining the architectural context that makes all other technical decisions better — that is.

Code is how ideas become systems. A leader who can't read the code can't fully understand the systems they're responsible for. That's not a philosophical position — it's a practical one.

Stay technical. Just be intentional about what that means.

The AI Hype Is Exhausting. Here's What's Actually Worth Paying Attention To.

Denis Moroz — Sat, 02 May 2026 11:10:05 +0000

I want to be honest with you: I almost didn't write this post.

Not because I don't have thoughts on it. I have too many. And that's the problem. Every week there are approximately four hundred new AI announcements, three conflicting hot takes, two long-form think pieces that completely contradict each other, and one tweet that goes viral for being confidently wrong.

It's exhausting. And I think most of it doesn't matter.

So here's my actual filter. The one I use to decide what's worth reading, what's worth testing, and what's worth completely ignoring.

The noise I skip

"AI will replace [profession] in X years." I have read this headline about lawyers, writers, radiologists, programmers, and teachers. For some of these I've seen confident estimates ranging from 2 to 25 years, sometimes in the same week. Nobody knows. The models doing these predictions are guessing. Move on.

New model benchmark announcements. Every new model is the "best ever" on some benchmark. Benchmarks are useful for researchers comparing controlled capabilities. They're not useful for figuring out whether you should change your workflow. I care about what the model can actually do in the context I use it. That requires trying it, not reading the press release.

AI company raises massive round. Good for them. This tells me the investors think there's money to be made. It tells me nothing about whether the product is good or whether you should use it.

\"AI is just a bubble.\" Maybe. The internet was also a bubble in 2000 and it also ended up being the most significant infrastructure of the 21st century. \"Bubble\" and \"real and important\" are not mutually exclusive.

The signal I pay attention to

Behavior changes in people I trust. Not what they say they believe about AI — what they actually do. If a developer I respect switches to a new coding tool and sticks with it for six months, that's more signal than a hundred benchmarks.

Things that work without a tutorial. Good tools don't require you to learn how to prompt them. If I have to take a course to get value out of something, the thing isn't ready yet. The best AI tools I've used feel obvious on first use.

New capabilities, not new interfaces. The AI space is full of wrappers — products built on top of GPT-4 or Claude that add a specialized UI. Some of these are useful. But they're not new capability. Actual new capability is when something becomes possible that genuinely wasn't before. That's rare. That's worth paying attention to.

When non-technical people start using something. Not because that makes it legitimate, but because it means the interface problem has been solved. Broad adoption by non-technical users is a reliable indicator that a tool has actually crossed the usability threshold. It took years for the smartphone to get there. AI tools are starting to get there in months.

What I think is actually worth your attention right now

I'll be specific, since I just told you I hate vague takes.

The agentic shift. AI systems that can take a sequence of actions — browse the web, write a file, run code, send an email — without constant human input. This is where the work is happening right now, and it's early enough that the patterns aren't set yet. If you want to understand where things are going in the next 12–18 months, this is it.

Voice interfaces maturing. The gap between voice as a party trick and voice as an actual interface is closing fast. I'm not talking about Alexa. I'm talking about full conversational interfaces that can handle ambiguity, follow context, and take action. The latency is still too high but it's dropping.

The commoditization of base models. Claude, GPT-4, Gemini, and others are getting close enough that the underlying model matters less than the integration, the context, and the interface. This changes the competitive landscape significantly and shifts the interesting work to the application layer.

The honest version

I don't know which companies will win. I don't know which models will matter in three years. I don't know if AGI is two years away or twenty.

What I do know: the tools available today are already meaningfully useful if you're willing to actually use them instead of just reading about them. And the space is moving fast enough that your filter matters more than your forecast.

Stop trying to predict. Start paying attention to what's working.

This is the kind of honest take I share regularly at denismoroz.ai. If you want the actual signal without the noise, the newsletter is where that lives.

The Real ROI of AI at Work (It's Not What Your Vendor Is Claiming)

Denis Moroz — Sat, 02 May 2026 11:09:30 +0000

Every AI vendor has a number. "$80,000 saved per employee per year." "10x faster." "2,000 hours returned to the business." The numbers are large, compelling, and almost always wrong — not because companies are lying, but because they're measuring the wrong things.

I've spent enough time working with AI in business contexts to have a clear picture of what the ROI conversation usually gets wrong. Let me break it down the way I actually think about it.

The Standard ROI Story (And Why It's Incomplete)

The typical vendor claim goes like this: AI does Task X. Task X used to take Y hours. Therefore you've saved Y hours × hourly rate = $Z.

This math is real but incomplete in three ways:

1. Time saved ≠ cost saved. If AI shaves 20% off the time your marketing team spends on copy drafts, you haven't necessarily cut costs — you've freed up capacity. That's valuable, but it's only valuable if the team uses that capacity for something higher-return. If it disappears into slightly longer meetings, you've improved morale at best.

2. Time saved ≠ quality kept. This is the one almost nobody measures. AI speeds up output. It doesn't always preserve output quality at that speed. I've seen teams celebrate a 3x output increase from AI-assisted copy, then spend weeks quietly untangling why conversion rates dropped. Speed without quality isn't an improvement.

3. The hidden costs don't show up in the headline. Prompt tuning. Output review. Occasional corrections. The cognitive cost of keeping humans oriented in a workflow that AI is partially running. The real total cost of using AI seriously is higher than the license fee, and it's rarely included in the ROI pitch.

The Framework I Actually Use

I think about AI ROI across three dimensions. Not a formula — more of a forcing function for honest evaluation.

1. Time Saved

The real question here isn't "how much time does the task take now vs. before?" It's: what happens with the time that's freed?

A team that uses AI to process customer feedback 5x faster is doing something valuable — but only if the freed time goes into actually responding to that feedback, not more reporting. Ask explicitly: what is the recaptured time flowing toward? If you can't answer that, the time savings are theoretical.

2. Quality Delta

For every AI-assisted workflow, track a quality metric before and after. This sounds obvious and almost nobody does it.

Writing tasks: track engagement, completion rates, replies — whatever matters for that content.
Decision-support tasks: track decision outcomes over time. Are you making better calls? More confident ones?
Research tasks: track accuracy on sampled outputs. What percentage of AI-generated summaries would you have caught something wrong in?

Quality can go up (AI helps produce more polished first drafts, catches errors humans miss), stay flat (AI speeds up work without changing its substance), or go down (AI introduces confident-sounding errors that don't get caught). You need to know which one you're experiencing.

3. Trust Built or Eroded

This is the long game, and most organizations are ignoring it.

Every time AI produces something wrong and it gets caught before causing harm: trust goes up slightly, the workflow gets better.
Every time AI produces something wrong and it doesn't get caught: trust erodes — sometimes slowly, sometimes catastrophically when the error surfaces later.

If you're deploying AI without clear human review points for high-stakes outputs, you're making a bet that the trust-erosion track doesn't activate. Some organizations will win that bet. Many won't.

The organizations that will do best with AI long-term are building cultures where it's normal and expected to verify AI outputs — not because the tool is bad, but because that's how you build a system that can be trusted at scale.

A Self-Assessment You Can Actually Do

Here's a quick way to evaluate any AI workflow you're considering or already running.

For each AI-assisted task in your work, answer:

What was the output quality before AI? (Be honest about what "good" actually looked like.)
What is the output quality now? (Spot-check 10 outputs. Don't trust vibes.)
Where does the saved time actually go? (Track this for one week.)
What are the failure modes? (What does it look like when this goes wrong, and how often does that happen?)
Who reviews AI outputs before they become consequential? (If the answer is "nobody," that's your biggest risk.)

Most people skip question 4 and 5. Those are the ones that cost you.

What Good ROI Actually Looks Like

The teams I've seen genuinely benefit from AI share a few traits:

They use AI for high-volume, lower-stakes tasks first. Email drafts. Research summaries. First-pass document review. Routine data processing. These have short feedback loops — errors are caught quickly and the cost of getting one wrong is low.

They measure before they automate. They know what the baseline looks like, so they can actually compare.

They add review steps, not remove them. At least initially. AI in the middle of a workflow with a human at the output end is significantly more reliable than AI at the output end with nothing after it.

They think in compounding returns, not one-time savings. The ROI of good AI integration isn't a one-time efficiency jump — it's a gradually improving system where the humans and AI get better at working together. That takes time and looks slow at first.

The Honest Benchmark

Here's a simple framing I'd give any team evaluating AI:

If your AI use is making your work faster, roughly as accurate, and you're redirecting the saved time to something valuable — that's genuinely good ROI.

If your AI use is making your work faster but subtly less accurate, with no review step in place — you're borrowing against future trust.

If your AI use is mostly serving as demos and proof-of-concept projects that haven't changed any real workflows — your ROI is zero, and the investment is in theater.

Be honest about which of those describes you right now.

Next: what the latest AI release actually means for the people in the room who aren't engineers.

LLMs in Production: What No One Tells You

Denis Moroz — Sat, 02 May 2026 11:08:54 +0000

Deploying a language model demo is easy. Running it in production — reliably, at scale, within budget — is not. After shipping several LLM-backed products, here's the honest picture.

Cost Is Not Linear

Every engineer does the math: tokens in × tokens out × price per 1M tokens = monthly bill. Then they ship to production and discover the bill is 4x the estimate.

Why? Because production traffic is never as clean as your prototype. Real users:

Send ambiguous queries that need clarification rounds
Retry when responses feel off
Trigger edge cases your prompt never anticipated
Explore the product in ways you didn't model

Budget 2-3x your projected token usage for the first quarter in production. Track cost per user, not cost in aggregate — aggregate numbers hide the outliers who will blow your budget.

Prompt Engineering is Software Engineering

Treat prompts like code:

Version control them
Test them against a regression suite
Review changes before deployment
Monitor production drift

I've seen teams ship prompt changes as untracked edits to environment variables. Three weeks later, a regression in a corner case they didn't know existed. No way to diff, no way to roll back.

Use a prompt management system. At minimum, store prompts in your repo, not in .env files.

Latency Has Tails

Average latency for GPT-4-class models is roughly 1-3 seconds for typical requests. P99 is often 8-15 seconds. P99.9 includes timeouts.

For most applications, you should:

Stream all responses — users tolerate latency much better when they see tokens appearing
Set aggressive timeouts and have a fallback path (retry with a faster model, return a cached response)
Track latency percentiles, not averages — averages hide the user experience for 1 in 100

The Prompt Injection Problem

If your product processes user-provided text through an LLM, you have a prompt injection surface. This is not theoretical.

Common scenarios:

Document summarization where the document contains "Ignore previous instructions"
Customer support bots that process user-submitted tickets
Code review tools that analyze user-submitted code with embedded instructions

Defense in depth:

Sanitize inputs (strip instruction-like patterns before including in prompts)
Separate system and user content with role delimiters the model respects
Treat LLM outputs as untrusted user input before rendering them
Monitor for anomalous output patterns

No defense is perfect. Assume attackers will find ways around your guardrails and design your system so that a successful injection doesn't cause irreversible harm.

Evals are Non-Negotiable

You cannot ship changes to your AI system confidently without evals. A test suite of 50-100 representative prompts and expected output characteristics (not exact string matches — LLMs are stochastic) is the minimum bar.

What to eval:

Task accuracy: Does the model do the right thing?
Format compliance: Does the output match the expected structure?
Refusal rate: Is the model refusing valid requests?
Hallucination rate: Is the model making up facts in a domain where you can verify?

Run evals before every prompt change, model upgrade, and temperature adjustment. The cost of an eval suite is a rounding error compared to the cost of a silent regression in production.

Model Versioning Surprises

Model providers update their models without always announcing breaking behavioral changes. A model that was reliable at your task in Q1 may behave differently in Q4, even with the same version tag.

Point to specific model versions in production (e.g., gpt-4-0613, not gpt-4). Subscribe to your provider's changelog. Run your eval suite against any model update before rolling it out.

What Actually Matters

After all of this, the one thing that predicts success more than anything else: feedback loops.

Teams that instrument everything (latency, cost, user thumbs-up/down, session length), evaluate regularly, and iterate on their prompts weekly consistently outperform teams that ship a v1 and assume the model handles the rest.

The model is a component in a system. The system needs the same engineering discipline as any other component in production.

I Used AI to Plan My Entire Month. Here's What Actually Worked

Denis Moroz — Sat, 02 May 2026 11:08:19 +0000

A few weeks ago I decided to try an experiment: hand everything off to AI. Calendar, meals, work schedule, daily journaling prompts — all of it. Not because I thought AI would be perfect at it, but because I wanted to know exactly where it helps and where it doesn't.

Here's the honest account.

What I Actually Did

I started on a Sunday. I dumped everything I needed to get done in April into Claude — work projects, personal commitments, fitness goals, social obligations, a few things I'd been putting off for weeks. Then I asked it to build me a monthly plan.

Not just a calendar. A system. When to do focused work. When to batch errands. When to leave buffer for the unexpected. I gave it my working hours, my energy patterns (morning person, slower after 3pm), and a rough sense of which projects were highest priority.

It gave me back a plan. Surprisingly good. Organized by week, with themes — a "heavy writing" week, a "catch-up and admin" week, a week with more space carved in because I'd mentioned a friend visiting.

What Actually Worked

Calendar blocking. This was the clearest win. AI is genuinely good at looking at a list of obligations and suggesting how to distribute them. It's better than I am at not overloading Tuesday just because Tuesday is the most obvious day. It spread things evenly in a way I almost never do manually, and it respected my stated energy patterns.

Meal planning. I told it I wanted to eat well, I don't want to spend more than 30 minutes cooking on weeknights, and I had a few things I can't eat. It gave me a week of dinners with a shopping list organized by section of the grocery store. I did this for four weeks in a row. It worked. I spent less time standing in front of the fridge making bad decisions.

Journaling prompts. I asked for a daily prompt for each day of the month — something that would push me to actually reflect rather than just write "had a good day." These were genuinely good. Not generic ("what are you grateful for?"), but varied and specific. Some were uncomfortable in the right way. I didn't use all 30, but I used most of them.

Reviewing decisions out loud. A few times I used AI like a sounding board — I had a decision I was sitting with, I described the situation, and I asked it to help me think through it. This worked better than I expected. Not because it made the decision for me, but because having to explain it clearly enough for AI to understand forced me to articulate what I actually knew and what I was avoiding.

What Didn't Work (Or Felt Weird)

It doesn't know what a Tuesday actually feels like. The plan was technically sound. But AI has no sense of the particular kind of tired you feel after back-to-back video calls, or the fact that Thursday afternoons at my desk have a different texture than Thursday mornings. I ended up adjusting things mid-week more than I expected — not because the plan was wrong, but because life is granular in ways a calendar can't capture.

Over-optimization. The first draft had me scheduled pretty solidly. It was a lot. I had to explicitly push back and ask it to build in blank space — time that wasn't for anything. It's not intuitive to AI that unscheduled time has value. You have to name it.

Accountability doesn't come with the plan. The plan was good. Following it was still on me. I assumed — without quite saying so — that having a plan from AI would somehow make it easier to stick to. It didn't. A plan from AI is just a plan. What makes you follow through on it hasn't changed.

Journaling felt slightly clinical. A few of the prompts were great. Some felt like they were generated by someone who'd read a lot of journaling content but hadn't actually journaled. They were technically correct prompts — and I'd rather have those than none — but occasionally I'd read one and just swap it for my own question.

What I'll Keep Doing

The grocery list and meal planning, without question. That's now a permanent part of my Sunday routine. Twenty minutes of prompting saves me hours of decision fatigue across the week.

Calendar blocking for heavy weeks. Whenever I have a lot coming in, I'll run it through AI before I schedule it manually. The spread is better and I'm less likely to accidentally make one day brutal.

Using AI as a thinking-out-loud partner when I'm stuck on a decision. This is underrated.

The Honest Summary

AI is an excellent planner and a terrible accountability partner. It can see the whole month at once and suggest a sensible shape for it. It cannot feel the week with you, understand when you're running on empty, or notice when you've quietly stopped following the plan.

Use it to build the structure. Then show up for it yourself.

Next week: a blunt look at why most AI agents are significantly less impressive than the demos suggest.

How One Small Team Replaced 3 Manual Workflows With AI (And What Actually Broke)

Denis Moroz — Sat, 02 May 2026 11:07:43 +0000

A friend of mine runs a 7-person product agency. In late 2025 he messaged me: "We're probably saving 40 hours a month now. But three things broke that we didn't expect."

I asked him to walk me through it. Here's what they did, how they did it, and — importantly — what went sideways.

I'm sharing this because most AI-at-work content is either cheerleading or fear. Neither helps you figure out what to actually do.

The Setup

Seven people: 2 designers, 2 developers, 1 strategist, 1 ops person, and my friend who runs the whole thing. They do product strategy, UX design, and early-stage build work for startups. A project typically runs 6–12 weeks.

They were doing well, growing, and also drowning in process overhead. Three workflows in particular were eating time:

Client meeting notes — Summarizing calls, distributing action items, keeping clients informed
Proposal writing — New business pitches took 6–10 hours of senior time per pitch
QA reporting — Developers writing bug reports and summarizing test runs by hand

These weren't broken workflows. They were working fine. They were just slow and expensive.

What They Changed

Workflow 1: Meeting Notes

They started routing all client calls through a transcription service (they use Fireflies), then piping the transcripts into Claude with a prompt that extracts decisions made, open questions, and action items with owners.

The output goes into Notion automatically. The ops person reviews and sends the client summary — a task that used to take 45 minutes now takes about 5.

Result: ~40 minutes saved per client call. They have 3–4 client touchpoints per week.

Workflow 2: Proposal Writing

Proposals at this agency follow a recognizable structure: situation analysis, recommended approach, team and process, timeline, investment. My friend built a prompt template that pulls from a Notion database of past projects and outputs a first draft.

A senior strategist still reviews and personalizes everything — especially the situation analysis, which requires real understanding of the client. But the skeleton work that used to take 4–6 hours now takes about 90 minutes.

Result: Senior time on proposals dropped by ~60–65%. They're pitching more because the cost of pitching dropped.

Workflow 3: QA Reporting

The developers started pasting test results and error logs into Claude and asking for structured bug reports in their standard format. They also use it to generate first drafts of test case lists when starting a new feature.

Result: Uneven. More on this in a moment.

What Actually Broke

1. The junior devs stopped learning how to debug

This is the one my friend feels worst about. When a junior developer pastes an error into Claude and gets a clear explanation, they understand the error. But they don't develop the debugging intuition that comes from sitting with confusing output for a while and working through it.

One of his junior developers shipped a bug fix correctly but couldn't explain why the original error happened. \"He knew the answer. He didn't understand the problem.\"

They've since added a rule: for bugs that are good learning opportunities (ambiguous, architectural, novel), the junior has to form their own hypothesis first before using AI to check it.

2. The proposal quality variance got worse

Proposals got faster but more uneven. A good first draft made it easier to ship quickly. A mediocre first draft made it easier to ship something mediocre — because the revision felt like a smaller lift than it was.

The baseline raised. The ceiling lowered. They fixed this by adding a mandatory senior edit on the framing and situation analysis sections specifically, treating those as non-delegatable.

3. Client meeting summaries lost texture

The AI summaries were accurate and thorough. They were also flat. What got lost were the softer signals — the thing the client said hesitantly, the concern that got raised and then moved past, the vibe of the conversation.

The ops person started adding a \"Read Between the Lines\" section at the bottom of each summary — two or three sentences written by hand about what wasn't in the transcript but was in the room. That solved it.

What I Take From This

AI didn't break their workflows. It accelerated whatever was already there — including the weaknesses.

The junior dev issue was always a risk. Speed just made it surface faster. The proposal variance was latent in their review process. The summary flatness exposed how much they were relying on implicit knowledge that wasn't in documents.

Every workflow they touched got faster. Every workflow they touched also exposed something they hadn't been explicit about before.

That's not a reason not to use AI. It's a reason to use it with your eyes open. The things that break will tell you something true about your process.

I write about AI at work at denismoroz.ai. The newsletter is where I share this kind of case study in more depth.

Everyone's Building AI Agents. Most of Them Are Just Expensive Chatbots.

Denis Moroz — Sat, 02 May 2026 11:07:07 +0000

"Agent" is the hottest word in AI right now. Every product announcement has one. Every startup deck mentions them. Your enterprise software vendor is definitely about to pitch you one.

Most of them are not agents. They're chatbots with extra marketing.

Let me explain the difference, and why it matters.

What an Agent Actually Is

A real AI agent does something specific that a chatbot cannot: it takes actions autonomously over time, with the goal of completing a task — not just generating text.

The key components of an actual agent:

A goal — something to accomplish, not just something to respond to
Access to tools — ability to search the web, run code, call APIs, write files, interact with other software
Persistent memory — enough context to pick up where it left off
Decision-making — the ability to choose what to do next based on what it finds

An agent that books you a flight is different from a chatbot that explains how flights work. An agent that monitors your inbox and drafts responses while you sleep is different from an assistant that helps you draft one email when you ask.

The gap between those two things is enormous.

What Most "Agents" Actually Are

A chatbot with a search tool attached is not an agent. It's a chatbot with a search tool.

A workflow that chains three API calls together is not an agent. It's automation with an LLM in the middle.

A "copilot" that suggests what you should do next is not an agent. It's recommendations wrapped in AI language.

These things can be genuinely useful. I use several of them. But calling them agents inflates expectations in ways that lead to real disappointment — and, more importantly, it obscures what the technology can actually do.

The demos are particularly misleading. I've watched AI agent demos where the agent appears to autonomously complete a complex multi-step task in real time, fluidly. And then you try to replicate that workflow and discover it breaks on step three whenever the input is slightly different, requires constant babysitting, and costs five times what you expected.

That gap — between the demo and the reality — is where a lot of money and trust is currently being lost.

The Specific Problems With Current Agents

Reliability degrades rapidly with complexity. A single-step AI task is pretty reliable. Two steps: still good. Five steps: you're managing failure modes. Ten steps: you need a human in the loop or you will regret it. Real-world processes are almost always ten-plus steps with edge cases the agent has never encountered before.

They hallucinate into consequential actions. When a chatbot makes something up, you read it and catch it (hopefully). When an agent makes something up and then acts on it — sends an email, books an appointment, deletes a file — the error has already propagated. The cost of hallucination in an agentic context is fundamentally different than in a conversational one.

Context length is still a ceiling. Agents need to hold a lot of context to complete multi-step tasks across time. Current models have gotten better, but a truly long-running agent still runs into limits. When it hits those limits, it starts forgetting. When it starts forgetting, tasks fail in ways that are hard to diagnose.

Recovery from errors is weak. Humans, when we hit a wall, adapt. We backtrack, we try a different approach, we recognize when we're lost. Current agents mostly don't do this gracefully. When they fail, they often fail confusingly — keeping going when they should stop, or stopping when they should try again.

Where Agents Actually Work Right Now

This isn't all negative. There are real use cases where agents are genuinely useful today:

Bounded, well-defined tasks. Research tasks with a clear endpoint. Data extraction from a fixed set of sources. Customer support triage within a defined scope. These work because the failure modes are narrow.

High-volume, low-stakes work. If you need 500 things processed and some percentage of errors is acceptable, agents are a good fit. The economics work when the alternative is manual labor and perfection isn't required.

Internal tooling with human review. Agents that generate outputs a human then reviews before action are more useful than fully autonomous agents. You get the speed benefit without the unrecoverable error problem.

Coding. This is the one domain where AI agents are genuinely close to the hype. Cursor, GitHub Copilot Workspace, and similar tools can take a task description and do significant chunks of real engineering work. Still not perfect. Still needs review. But meaningfully more capable than in other domains.

What I'd Actually Look For

If you're evaluating an AI agent product, I'd ask these questions before buying:

What happens when it fails? (If the answer is unclear, it fails badly.)
Is there a human review step before any irreversible action?
What's the actual task it does, and is that task genuinely multi-step and autonomous — or is it one step dressed up in agent language?
What does it cost when it runs many times? (Agentic workflows are expensive at scale.)
Can I see it fail? (Demos show successes. Ask to see what a failure looks like.)

The Honest Take

AI agents are real, they're coming, and eventually they will do genuinely impressive things. But "eventually" and "right now" are different things, and the gap between them is currently being obscured by marketing at scale.

Real agents are being built. They work in narrow, well-defined domains. They need oversight. They fail.

Know what you're buying.

Next up: the real ROI of AI at work — and why your vendor's numbers are probably wrong.

Claude vs. ChatGPT vs. Gemini: Which One Should You Actually Use?

Denis Moroz — Sat, 02 May 2026 11:06:32 +0000

The honest answer is: it depends. The less honest answer is the one most comparison posts give you, which is a table of features that tells you nothing about which tool is actually better for your work.

Let me try to be more useful.

I've used all three regularly for the past year. Here's how I actually think about them — not what their feature pages say, but what I've noticed in practice.

The short version, if you want it

Use Claude if: You do a lot of writing, analysis, or reasoning and you want responses that are coherent, nuanced, and feel like they're tracking what you actually mean.

Use ChatGPT if: You need the widest feature set, the most integrations, or you're doing image generation alongside text work. Or if you've just been using it and it works for you — switching costs are real.

Use Gemini if: You're deep in Google Workspace and want native integration with Docs, Sheets, Gmail, Drive. Or if you want the best real-time web search baked into your AI conversations.

Now let me explain why.

Claude (Anthropic)

Claude's strongest trait is what I'd call conversational coherence. You can have a long, complex conversation and it actually holds the thread. It tracks what you said 10 messages ago. It doesn't drift.

The responses are also more likely to be careful when careful is warranted. Claude will flag uncertainty instead of bulldozing past it. This is sometimes annoying when you just want a quick answer. It's valuable when accuracy matters.

I use Claude for: writing drafts, reasoning through complex decisions, any analysis where I want something that won't hallucinate confidently. The Projects feature is excellent — persistent context across sessions so I don't have to re-explain my situation every time.

Where it falls short: Web browsing is available but feels bolted on. It has fewer third-party integrations than ChatGPT. If you want to generate images or run code in a sandbox environment with a lot of hand-holding, ChatGPT's interface is more developed for that.

Model I use: Claude Sonnet 4 for daily work, Opus for anything that needs maximum reasoning depth.

ChatGPT (OpenAI)

ChatGPT is the Swiss Army knife. It has the most features, the largest ecosystem of plugins and integrations, and it's been available the longest, which means the most people know how to use it. If you're working with a team and sharing prompts, there's a good chance they're using ChatGPT.

DALL-E integration is native — if you want to switch between image generation and text in the same conversation, this is the smoothest experience. The Code Interpreter (now called Advanced Data Analysis) is genuinely impressive for data work if you're non-technical.

I use ChatGPT for: image generation, data analysis with uploaded files, anything where I want to use an integration that doesn't exist yet in Claude.

Where it falls short: The responses can be more verbose and less precise than Claude for complex reasoning. I've also found it more prone to confident hallucination on factual questions — it'll give you a firm answer when it should give you a hedged one. The model quality gap between GPT-4o and Claude's top models is smaller than it used to be, but I still reach for Claude when the output quality really matters.

Model I use: GPT-4o for most tasks.

Gemini (Google)

Gemini's ace card is Google integration. If you're in Gmail, Docs, Drive, or Sheets all day, Gemini is already there. You can ask it to summarize your emails, write in a Google Doc directly, or pull data from your Drive without switching context. That integration advantage is real.

Google Search is also deeply wired in. When I want an AI response that's grounded in current information — not just today's data — Gemini is often the best at this. It cites recent sources in a way that's actually useful.

I use Gemini for: research that needs current web grounding, anything inside Google Workspace, quick summaries of Google Docs content.

Where it falls short: The conversational feel isn't quite there yet compared to Claude or ChatGPT. It's competent but I find myself trusting its reasoning less on nuanced questions. The Google Workspace integration is the strongest argument for it — without that advantage, I reach for Claude or ChatGPT first.

Model I use: Gemini 2.0 Pro for workspace tasks.

The question to actually ask yourself

Not \"which one is best?\" but \"what am I trying to do and where does friction come from?\"

If you write and reason a lot: Claude.
If you work visually or need the broadest feature set: ChatGPT.
If you live in Google Workspace: Gemini.

The most expensive mistake is spending weeks deciding instead of just trying. Free tiers exist for all three. Pick one, use it for two weeks, then try another. You'll know quickly.

And then pick one as your default. Not because the others are bad but because switching all the time carries its own cost — you never get deep enough with any of them to use it well.

I use AI every day and write about what actually works at denismoroz.ai. The newsletter is where I share the things I'm learning in real time.

Building AI Products That People Actually Use

Denis Moroz — Sat, 02 May 2026 11:05:56 +0000

The graveyard of AI demos is enormous. Impressive benchmarks, slick interfaces, and… nobody uses them after the first week.

I've shipped AI features at scale and consulted on dozens of AI product bets. The pattern is consistent: teams optimize for capability, not for behavior change.

The Wrong Question

Most teams ask: "What can our model do?"

The right question is: "What behavior do we want to change, and why hasn't existing tooling changed it?"

AI is a technology primitive, not a product. A hammer doesn't create the need for nails — the nails were always there, the users just didn't have a good way to hit them.

Habit Anchoring

The most durable AI products attach to existing habits. They don't create new workflows — they compress existing ones.

GitHub Copilot works because developers were already writing code. The model fits inside the groove that decades of muscle memory carved. You don't need to teach the user a new mental model; you augment the existing one.

The mistake: building standalone AI apps that require users to remember when to open them.

Lesson: Find where users already have the intent, and compress the gap between intent and outcome.

The Blank Slate Problem

An empty chat interface is an empty box. Users don't know what to put in it.

ChatGPT solved this with extreme discoverability (suggestions, examples, share links) and massive brand awareness that primed users with expectations before they ever opened the product.

Most AI startups don't have that. They put a text box on the page and expect users to discover the value proposition themselves.

Lesson: Don't make users discover what your product is for. Make the first interaction so specific that the value is undeniable in 30 seconds.

Latency Kills

The psychological research is clear: perceived wait time above ~400ms breaks the flow of thought. In a text editor, even 200ms feels sluggish.

Most AI products are built with the assumption that users will tolerate latency because the output is good. This is wrong. Users tolerate latency for asynchronous tasks (generate a report, draft this email). They abandon synchronous flows (autocomplete, search, inline suggestion) the moment latency becomes perceptible.

Lesson: Design your product around your actual latency profile, not your aspirational one. Streaming is not optional.

The Trust Ladder

AI makes mistakes. This is a feature when the mistake surface is controlled — spell-checkers get away with false suggestions because the undo cost is one keystroke. It's a bug when the mistake surface is opaque — AI-generated code that compiles but does the wrong thing in production.

Successful AI products calibrate the trust ladder deliberately:

Start with low-stakes, reversible actions where wrong outputs are obvious
Build user trust through consistent accuracy in that narrow domain
Expand scope as trust accumulates

Lesson: Ship in the domain where a bad output is annoying, not catastrophic. Expand from there.

What to Build

The best AI products I've seen share three traits:

They have a clear escape hatch — the user can always override or ignore the AI
They make the AI's reasoning visible — not just the output, but why
They are embarrassingly narrow at launch — one job, done extremely well

Build the version that does one thing so well that users feel cheated by every alternative. The scope will expand naturally as trust grows.

The AI companies that win won't be the ones with the best models. They'll be the ones with the deepest understanding of why their users show up every day.

The AI Productivity Stack That Actually Fits a Busy Life

Denis Moroz — Sat, 02 May 2026 11:04:43 +0000

Every AI productivity article gives you a list. Twenty tools. Thirty shortcuts. A hypothetical morning routine that assumes you have three uninterrupted hours and no meetings before 10am.

This isn't that. This is the actual workflow I use on days when I have back-to-back calls, something urgent comes in before 9am, and I still need to produce things that are good.

I'm going to take you through it from morning to end of day, and I'll be honest about what AI handles, what it doesn't, and what it costs.

The Philosophy First

The only AI stack that works long-term is one that removes friction at the exact moment friction appears — not in theory, but in your actual day. If using the tool requires switching contexts, explaining yourself from scratch, or trusting output you can't quickly verify, the tool won't stick.

Every tool I've kept has passed this test: would I use this when I'm already tired and mildly overwhelmed? If the answer is no, it lives in a demo video, not my workflow.

Morning: Clear the Cognitive Queue (15–20 minutes)

Before I open email or check Slack, I do a quick brain dump into Claude. Not a structured prompt — literally just what's in my head. What's on my plate today. What I'm worried about. What I need to decide.

Claude helps me sort it. Not because it has magical insight, but because having to articulate thoughts clearly enough for AI to understand them forces me to clarify them for myself. It's the externalizing-your-thinking trick, just faster than journaling.

From this I get: a rough priority list for the day, and any decisions I was avoiding named explicitly.

Time: 10 minutes

Tool: Claude

Cost: Free tier is enough for this. Pro ($20/month) if you want persistent memory across sessions.

Late Morning: The Writing Block

Most of what I produce professionally involves writing. I use AI at two specific points in that process:

First draft generation. I'll give Claude a brief — the point I want to make, who I'm writing for, any specific constraints — and ask for a first draft. I don't use that draft directly. I use it as a thing to react to. Editing is faster than writing from a blank page, and AI drafts give me something concrete to push against.

Polish pass. After I've written my own version, I'll run it through one more pass asking Claude to catch anything that's unclear, redundant, or where my argument is weaker than I think it is. It catches about 30% of what a good editor would catch. Not a replacement for a real editor, but a useful first filter.

Time: Variable, but saves me 20–40% off my normal writing time

Tool: Claude

Note: Perplexity for any section that requires factual claims I want to verify before publishing.

Midday: Research and Rabbit Holes

When I need to understand something new quickly — a topic I'm writing about, a company I'm preparing to talk to, a technical concept I need to explain accurately — Perplexity is my first stop.

It gives me: a direct summary, source citations I can check, and usually a few related questions I hadn't thought to ask. It replaced about 80% of my Google search habit for this kind of research.

I still go to primary sources. But Perplexity dramatically reduces the number of tabs I have to open before I find the thing I actually wanted.

Time: Saves me 10–15 minutes per research task

Tool: Perplexity

Cost: Free tier handles most use cases. Pro (~$20/month) for more complex research.

Afternoon: Async Communication

This is where AI earns back the most time for me across a week.

I get a lot of messages that require a thoughtful response but are not urgent. Instead of writing each one from scratch, I paste the message into Claude with brief context on who it's from and what I actually want to say in response. It drafts something. I edit it to sound like me, verify the facts, and send.

I'm not outsourcing my relationships. I'm outsourcing the blank-page problem on the 12th message of the afternoon when my brain is tired and I know exactly what I want to say but can't get the phrasing right.

Time: 3–5 minutes per complex message instead of 10–15

Tool: Claude

Important caveat: I read every draft carefully before sending. AI doesn't know my relationships, the full history, or the nuances. It drafts. I decide.

End of Day: Capture and Close

Last 10 minutes of my work day: I tell Claude what got done, what didn't, and what's carrying over to tomorrow. It helps me reframe the next day's priorities and sometimes catches when I'm being overly ambitious about what I can carry forward.

I also use Notion AI to turn rough notes from the day into usable records — meeting notes cleaned up, decisions documented, action items extracted. This used to take me 20 minutes. It takes 5 now.

Time: 10 minutes

Tools: Claude + Notion AI

What I skip: I don't use any automated "capture everything" tool. Too much noise.

The Cost Breakdown

Tool	Plan	Monthly Cost
Claude	Pro	$20
Perplexity	Free	$0
Notion AI	Add-on to existing Notion plan	$10
Cursor (for code)	Pro	$20
Total		$50/month

$50/month for tools I use every single day is, in my experience, the easiest ROI calculation in my budget. The time return is several hours per week. I'd pay twice this without thinking about it.

What I Don't Use (And Why)

Dedicated AI assistants with their own interfaces. I tried several. The context-switching was the killer — I didn't want to open a separate tool, I wanted AI where the work already was.

Automated workflows that run without me. I've experimented with these. They save time. They also introduce errors I find days later. Until I have a specific high-volume task where the error rate is acceptable, I prefer AI in the loop rather than AI in control.

Tools that require daily onboarding. If I have to re-explain my context every time I open the app, I won't use it consistently.

The One-Sentence Version

The AI stack that fits a busy life is small, lives where your work already lives, removes friction at the exact moments friction is highest, and requires you to stay in the loop.

That's it. You don't need twenty tools. You need three or four good ones you actually open.

Thanks for following along — if these posts have been useful, the newsletter is the best way to stay connected.