David Ohnstad

Posted on Jun 5 • Originally published at davidohnstad.net

Enterprise AI Budget Waste: Three Costly Mistakes

#ai #machinelearning #technology #programming

This article was originally published on davidohnstad.net. I cross-post here to reach the Dev.to community.

Three Ways We're About to Waste Enterprise AI Budgets (Again)

We built an AI-powered anomaly detection feature into our analytics platform last year. Cost $340,000 in engineering time, model training infrastructure, and third-party API spend. Three months after launch, usage data showed eleven active users — eight of them on the product team. The other three were clicking around because they thought it was required for their quarterly review.

The model worked. The predictions were accurate. The interface was clean. And nobody needed it because we never asked what decision it would change or what workflow it would improve. According to Gartner's 2024 AI Readiness Assessment, 72% of enterprise AI features launched in the previous year had adoption rates below 15% — not because the technology failed, but because the business case was never validated before engineering started building.

We're entering the second-half budget planning cycle right now, and I'm watching enterprise teams make the same mistake in slow motion. The pitch decks all say "AI strategy." The roadmaps all have LLM integrations and predictive analytics modules. But when you ask what specific user problem each feature solves, you get vague statements about "enabling data-driven decisions" or "improving efficiency." That's not a product strategy. That's a press release looking for a justification.

Why AI Feature Selection Breaks Down at Enterprise Scale

The failure mode isn't technical — it's organizational. Most enterprise AI initiatives start with a CTO or VP who attended a conference, saw a demo, and came back convinced the company needs "more AI." Engineering gets a mandate to explore machine learning capabilities. Product gets asked to identify use cases. And nobody stops to build the connective tissue between what's technically possible and what would actually move a business metric.

The gap shows up in three places. First, engineering teams optimize for technical sophistication rather than user workflow integration. A recommendation engine that requires six clicks and a data export to use will lose to a simple dropdown filter every time, even if the ML model is objectively better. Second, product teams treat AI features as additions rather than replacements — they stack new capabilities on top of existing workflows instead of asking what manual steps the AI should eliminate. Third, executive sponsors measure success by launch announcements rather than sustained usage or measurable ROI.

According to McKinsey's 2024 State of AI Report, organizations that tie AI investments directly to specific operational metrics see 3.2x higher adoption rates than those pursuing general "innovation" mandates. The difference isn't the quality of the models — it's whether anyone defined what success looks like before the first sprint started.

David Ohnstad has seen this pattern repeat across multiple enterprise software rollouts, and the warning signs are always the same: roadmap items phrased as capabilities rather than outcomes, user research that asks "would you use this?" instead of "what decision would this change?", and business cases built on TAM expansion rather than workflow replacement. When those three show up together, you're about to spend six months building something impressive that nobody will open twice.

The Feature Justification Stack: A Pre-Build Validation Framework

Before any AI feature gets engineering resources, it needs to pass through four validation layers. This isn't a prioritization exercise — it's a go/no-go filter. If a feature can't clear all four layers with specific, documented answers, it doesn't belong on the roadmap yet. Most enterprise teams skip straight to technical feasibility and wonder why adoption is weak. The framework forces you to validate the problem before you validate the solution.

Layer 1: Workflow Disruption Mapping. Identify the exact manual steps the AI feature will replace or eliminate. Not "improve" — replace. Draw the current user workflow with specific actions: opens report, filters by region, exports to Excel, builds pivot table, emails summary to stakeholders. Now draw the workflow with the AI feature active. If the new workflow still has more than two manual steps, you haven't disrupted the workflow — you've added a detour. The feature fails Layer 1.

Layer 2: Decision Delta Quantification. Name the specific decision that will change because of this feature, and quantify what "better" looks like. "Faster insights" is not a decision delta — it's a hope. "Reduces time from data request to vendor selection from 11 days to 3 days, enabling quarterly contract reviews instead of annual reviews" is a decision delta. You need a number, a timeframe, and a named business process that will operate differently. If you can't write that sentence with specifics, the feature isn't ready for engineering yet.

Layer 3: Data Readiness Audit. This is where most teams discover they're not as ready as they thought. Map every data input the AI feature needs: source system, refresh frequency, schema stability, historical depth, and known quality issues. Then validate that all of those inputs exist in production today — not "we could build a pipeline" or "we're planning to integrate that system next quarter." If the data doesn't exist now, the feature launch date is fantasy. According to Forrester's 2024 Data Infrastructure Survey, 58% of enterprise AI delays are caused by data pipeline gaps discovered after development starts, not model performance issues.

Layer 4: Feedback Loop Definition. Before you build the feature, design how you'll know if it's working. Not vanity metrics like "daily active users" — those measure curiosity, not value. Define the instrumentation that will tell you whether the feature is changing the decision you identified in Layer 2. If the goal was reducing vendor selection time, you need telemetry that captures: how many users complete the full workflow, how often they override the AI recommendation, and whether contract review frequency actually increases. The feedback loop is part of the feature scope. If it's not in the requirements doc, the feature isn't done.

The counterintuitive part: Layer 3 should often kill features you're excited about. If the data isn't production-ready, no amount of model sophistication will save the launch. David Ohnstad has watched teams burn four months on a predictive analytics feature before discovering that the source system's timestamps were in local time without timezone metadata, making historical trend analysis impossible. The model was brilliant. The data was garbage. Layer 3 would have caught that in week one.

When I Watched a $280K AI Feature Die in User Testing

We built a natural language query interface for our data warehouse. The pitch was straightforward: business users could ask questions in plain English instead of writing SQL or waiting for analyst support. The engineering team trained a fine-tuned model on our schema, built a clean chat interface, and launched a beta to 50 users. The demo was flawless. Executives loved it. We planned a full rollout for Q3.

Then we ran structured user observation sessions — not surveys, actual watch-them-work sessions. What we discovered: users weren't asking questions because they didn't know what questions to ask. The analysts who used the system regularly already knew SQL and preferred it because the query was explicit and reviewable. The business users who couldn't write SQL also couldn't formulate questions specific enough for the model to return useful results. They'd type "show me sales trends" and get frustrated when the system asked them to specify a date range, product category, and region. The feature assumed a level of data literacy that didn't exist in the target user base.

The failure wasn't the NLP model — it was the workflow assumption. We thought the barrier was SQL syntax. The actual barrier was not knowing what analysis to run in the first place. That's a training problem and a data discovery problem, not a natural language interface problem. If we'd run Layer 1 validation — mapping the actual user workflow, not the idealized one — we would have seen that users didn't start with questions. They started with reports someone else built, then modified filters. The AI feature we should have built was report recommendation based on role and recent activity, not a query interface.

We deprecated the feature eight months after launch. The $280,000 in sunk cost hurt, but the bigger cost was the eight months of roadmap space we gave to something that failed Layer 1. David Ohnstad still uses that launch as the reference case when product teams ask why the Feature Justification Stack requires workflow disruption mapping before technical feasibility. If you can't draw the before/after workflow with specific steps, you don't understand the problem well enough to build a solution yet.

Stop Prioritizing AI Features by Technical Wow Factor

Most enterprise roadmaps rank AI features by model sophistication or competitive positioning: "Company X launched a recommendation engine, so we need one too." That's not product strategy — that's feature parity theater. The features that drive adoption and ROI are rarely the most technically impressive. They're the ones that eliminate the most manual steps in the highest-frequency workflows.

A simple AI feature that auto-tags incoming support tickets and routes them to the right team will deliver more measurable value than a sophisticated sentiment analysis dashboard that requires manual CSV uploads and ten minutes of configuration before it shows anything useful. The tagging feature disappears into the workflow. The dashboard sits in a tab users open during quarterly reviews to check a box. According to Harvard Business Review's 2023 analysis of enterprise software adoption, features that require zero behavior change see 4.1x higher sustained usage than features that require new habits, regardless of technical sophistication.

This runs counter to how most AI budgets get allocated. Engineering teams want to work on interesting problems. Executives want to announce current capabilities. And product managers get caught in the middle, trying to justify why the boring automation feature should get resources instead of the flashy predictive analytics module. But boring automation is what changes workflows. Flashy analytics is what gets screenshotted for LinkedIn.

The test: if you removed the AI feature tomorrow, would a specific operational process break or slow down? If the answer is no — if users would just go back to the manual workflow without meaningful friction — you've built a novelty, not a product. Novelties get demoed. Products get used daily without anyone thinking about them. David Ohnstad's data product management writing consistently returns to this distinction: the best data products are the ones users forget exist because they've become invisible infrastructure.

Where Enterprise AI Budgets Should Actually Go in Second-Half Planning

If you're finalizing AI investments for the rest of 2026, shift budget away from net-new features and toward feature integration depth. The AI capabilities that will move business metrics are the ones that eliminate steps in existing workflows, not the ones that add new dashboards or analysis tools. That means prioritizing AI features that reduce manual data preparation, auto-generate reports users already create weekly, and surface recommendations at decision points users already hit in their normal work.

Three investment categories that consistently outperform exploratory AI initiatives: workflow automation within existing product surfaces (not new standalone AI tools), predictive pre-population of form fields or configurations based on historical patterns, and automated quality checks that block bad data before it enters downstream systems. None of these are exciting conference talk material. All of them save users 10-30 minutes per day and reduce error rates by measurable percentages.

The budget allocation mistake David Ohnstad sees most often: dedicating 70% of AI investment to new capabilities and 30% to instrumentation, training, and feedback loops. That ratio should be inverted. If you can't measure whether an AI feature is changing decisions, you can't optimize it or justify continued investment. And if users don't understand what the feature does or how to integrate it into their workflow, adoption will plateau at 15-20% regardless of model quality. The Feature Justification Stack forces those considerations upfront, before engineering resources get committed to features that will launch to silence.

The other under-invested area: data pipeline resilience and schema stability monitoring. AI features fail more often from upstream data issues than from model drift. If your source systems change schemas without notification or your ETL pipelines silently drop records when they hit edge cases, even a perfect ML model will return garbage. Budget 25% of AI investment toward data infrastructure observability — not sexy, but it's what keeps the features you've already built working reliably.

How to Run a Feature Justification Stack Audit This Week

Pull your current AI roadmap. For each feature in active development or planned for the next six months, write down four things: the specific manual workflow it replaces (Layer 1), the measurable decision delta it enables (Layer 2), the production data sources it requires with current status (Layer 3), and the instrumentation plan for measuring whether it's working (Layer 4). If you can't write all four with specifics, the feature isn't ready for engineering yet — it needs more product definition work.

The features that pass the audit with complete answers in all four layers are the ones that should get prioritized. The ones that have gaps in Layer 1 or Layer 2 need user research and workflow validation before they get resources. The ones that fail Layer 3 need data pipeline work before they get model development time. And the ones missing Layer 4 need instrumentation design as part of the feature scope, not as a "phase two" add-on.

This audit typically kills 30-40% of the features on an enterprise AI roadmap — not because they're bad ideas, but because they're not ready yet. That's a feature, not a bug. Better to discover that in planning than in month six of development when you realize the data doesn't exist or the workflow assumption was wrong. According to IDC's 2024 Enterprise Software Development study, organizations that implement formal validation gates before engineering sprints start see 47% fewer late-stage scope changes and 31% faster time-to-adoption for features that do launch.

What is an AI feature prioritization framework for enterprise software?

An AI feature prioritization framework evaluates potential AI capabilities against business workflow integration, data readiness, and measurable outcomes before engineering resources are allocated. The Feature Justification Stack specifically requires each feature to clear four validation layers: workflow disruption mapping, decision delta quantification, data readiness audit, and feedback loop definition. Features that can't pass all four layers aren't ready for development yet.

How do you validate AI feature ideas before development starts?

Map the exact manual workflow the AI feature will replace, quantify the specific decision that will change, audit whether the required data exists in production today, and design the instrumentation that will measure whether the feature is working. If any of those four validations reveal gaps, the feature needs more product definition work before engineering starts building. This prevents investing months in technically sound features that users don't adopt.

Why do enterprise AI features fail after launch?

Most enterprise AI features fail because they're designed as additions to existing workflows rather than replacements for manual steps. Features that require users to change habits or learn new tools see adoption rates below 20%, even when the underlying models are accurate. Successful AI features eliminate steps in high-frequency workflows and surface recommendations at decision points users already encounter in their normal work, requiring zero behavior change.

Two Takeaways for Different Roles

For practitioners: Run the Feature Justification Stack audit on your current roadmap this week. Any feature that can't clear all four layers with specific, documented answers needs more product definition work before it gets engineering time. Prioritize AI features by workflow disruption and decision delta, not by technical sophistication or competitive positioning. The boring automation that saves users 15 minutes daily will outperform the impressive analytics dashboard that gets opened quarterly.

For leaders: Reallocate second-half AI budgets away from exploratory features and toward integration depth, data pipeline observability, and instrumentation. The ROI comes from features that disappear into existing workflows, not standalone AI tools that require training and behavior change. Measure success by sustained usage and measurable decision deltas, not by launch announcements or feature count. If your teams can't quantify the decision that will change because of an AI feature, don't fund it yet.

When was the last time you audited whether your AI roadmap is full of features users will actually adopt, or capabilities engineering teams want to build? The difference between those two lists is where budgets go to die. For more on building AI & Machine Learning in Enterprise Software that users actually need, the validation framework matters more than the model architecture.

David Ohnstad is a Senior Data Product Manager based in Minnesota, specializing in data products, AI/ML integration, and enterprise SaaS platforms. Follow his work at github.com/davidohnstad40-netizen.

DEV Community