Why AI Models Fail in Enterprise: The 89% Problem

#ai #machinelearning #technology #programming

This article was originally published on davidohnstad.net. I cross-post here to reach the Dev.to community.

We Built an AI-Powered Vendor Risk Tool Nobody Used

We spent fourteen months building an AI-powered vendor risk assessment system. The engineering team was proud of the model's recall rates. Security leadership presented it to the board as a competitive differentiator. Procurement got trained on the new workflow. Then we ran the usage analytics six months post-launch: 11% of vendor assessments actually used the AI scoring. The other 89% defaulted to the manual checklist we'd promised to deprecate. According to Gartner's 2024 Enterprise AI Adoption Survey, we weren't outliers — 72% of AI features shipped in enterprise compliance tools see adoption below 30% within the first year.

The problem wasn't the model. The precision was fine. The interface was clean. The problem was we never asked whether AI was the right tool for the decision we were automating. We assumed that because vendor risk could use AI, it should. That assumption cost us a year of roadmap bandwidth and created a product feature that procurement teams actively route around.

David Ohnstad has seen this pattern repeat across enterprise software implementations: teams shipping AI features because the technology exists, not because the business process demands it. The gap isn't technical capability — it's decision architecture. Most vendor risk workflows don't need probabilistic scoring. They need consistent application of known rules, audit trails, and clear escalation paths. AI introduces uncertainty into a process where certainty is the entire value proposition.

The AI Justification Framework: Five Gates Before You Build

This is a five-gate decision framework for determining whether an enterprise workflow justifies AI implementation or whether rule-based logic delivers better business value. Each gate is a go/no-go decision. If you can't clearly pass a gate, the answer is: build the simpler system first, instrument it thoroughly, and revisit AI when you have the usage data to justify the complexity.

Gate 1: Volume Justification. Does this workflow process enough transactions per month that manual execution creates a measurable bottleneck? The threshold isn't arbitrary — calculate the cost of manual review time multiplied by transaction volume. If automating with simple rules delivers 80% of the time savings at 20% of the implementation cost, AI fails this gate. For vendor risk, most mid-market companies assess 40-80 new vendors per year. At 2 hours per assessment, that's 160 hours annually. A rule-based system handles that volume easily. AI introduces model maintenance overhead (retraining, drift monitoring, explainability tooling) that exceeds the manual cost you're trying to eliminate. David Ohnstad worked with a SaaS company that built an AI contract review system for 60 contracts per quarter. The model required a dedicated ML engineer to maintain. A paralegal reviewing contracts manually would have cost less and delivered higher accuracy.

Gate 2: Uncertainty Tolerance. Does the business process tolerate probabilistic outputs, or does it require deterministic results with full auditability? Vendor risk decisions are binary: approved, rejected, or escalated. Procurement teams don't act on a 73% risk score — they need a yes/no recommendation with a documented rationale. AI models produce confidence intervals, not certainties. That's valuable in fraud detection, where you're scoring thousands of transactions and investigating the top 2% by risk score. It's friction in vendor onboarding, where every decision requires a human sign-off anyway. If your workflow already has a human-in-the-loop gate that reviews 100% of outputs, AI isn't removing the bottleneck — it's adding a pre-processing step that obscures the decision logic. Rule-based systems produce auditable decision trees: "Vendor rejected because SOC 2 report expired 90+ days ago." AI systems produce: "Vendor scored 0.68 on risk index due to weighted feature importances across 140 input variables." The second explanation doesn't survive a compliance audit.

Gate 3: Data Sufficiency. Do you have enough labeled training data to build a model that outperforms heuristics, and can you refresh that dataset continuously without heroic manual effort? This is where most enterprise AI projects fail quietly. You need thousands of labeled examples to train a model, and you need ongoing labeled data to retrain as vendor risk patterns evolve (new compliance frameworks, shifting geopolitical risks, emerging threat vectors). If your "training data" is 200 manually assessed vendors from the last three years, and your security team doesn't have bandwidth to label 50 new examples per quarter, you're building a model on a frozen snapshot of the world. It will drift. Fast. According to McKinsey's 2023 State of AI Report, 67% of enterprise ML models experience significant performance degradation within 18 months due to data drift, and fewer than 40% of organizations have automated retraining pipelines in place. Rule-based systems don't drift — you update the rules when regulations change, and the logic remains transparent.

Gate 4: Failure Cost Asymmetry. What happens when the system is wrong? If false positives cost more than false negatives (or vice versa), and you can't easily adjust the decision threshold post-deployment, AI introduces unmanageable risk. In vendor risk, false negatives are catastrophic: you approve a vendor who later causes a data breach, regulatory violation, or supply chain disruption. False positives are annoying but recoverable: you reject a low-risk vendor, they appeal, you override. AI models optimize for overall accuracy, not for asymmetric failure costs. You can adjust decision thresholds, but that requires ongoing monitoring, A/B testing, and exec buy-in for each threshold change. Rule-based systems encode your risk tolerance directly: "Auto-reject if SOC 2 missing OR data processing addendum unsigned OR headquartered in embargoed jurisdiction." No model tuning required. The rules match your actual risk appetite, and you change them when your appetite changes.

Gate 5: Explainability Requirement. Can you ship a decision to users without explaining why the system reached that conclusion, or does every output need a traceable rationale? Procurement teams, legal counsel, and compliance officers don't trust black-box scores. They need to defend vendor decisions to auditors, executives, and occasionally vendors who dispute a rejection. "Our AI model flagged your company as high-risk" is not a defensible explanation. "You failed three of our five vendor security requirements: no SOC 2 Type II report, no cyber liability insurance, and your data processing addendum doesn't meet GDPR standards" is defensible. If your user base won't act on a recommendation they can't explain, AI fails this gate. Explainable AI (SHAP values, LIME, attention weights) helps, but it adds another layer of tooling, user training, and ongoing maintenance. Enterprise AI pilots proof of concept implementations frequently underestimate how much friction explainability tooling adds to the user workflow.

The Vendor Risk Mistake: Why We Built the Wrong Tool

David Ohnstad's team failed Gate 2 and Gate 5 before a single line of code was written. The vendor risk workflow required deterministic outputs with full auditability — exactly the scenario where rule-based systems excel and AI introduces unnecessary complexity. But the product roadmap had "AI-powered risk assessment" as a committed feature for two quarters, driven by competitive pressure and board-level interest in the company's AI strategy. The team built the feature because it was on the roadmap, not because the workflow justified it.

The technical implementation worked. The model ingested vendor questionnaires, security documentation, financial data, and third-party threat intelligence feeds. It output a 0-1 risk score with reasonable precision. The problem surfaced during user acceptance testing: procurement teams couldn't explain why a vendor scored 0.72 instead of 0.68, and they couldn't override the score without escalating to security leadership. The old checklist let them approve low-risk vendors in 15 minutes with a clear paper trail. The AI system required 30 minutes of data entry, produced a score they didn't trust, and forced escalations for edge cases the rules used to handle automatically.

Six months after launch, procurement had informally reinstated the manual checklist for 89% of assessments. They used the AI system only for vendors that triggered automatic escalation flags (high transaction volume, access to sensitive data, geographically distributed infrastructure). For standard SaaS vendors, marketing agencies, and low-risk contractors, the checklist was faster, more transparent, and required less training. The AI feature became a compliance theater checkbox: "Yes, we have an AI-powered vendor risk platform" for RFP responses, but not the actual operational system.

What would David Ohnstad do differently? Ship the rule-based system first. Instrument it thoroughly: track which rules trigger most often, where users request overrides, how often edge cases require manual review, and which vendor categories consume the most assessment time. After 12-18 months of production usage, analyze whether AI could improve the specific bottlenecks you've measured — not the hypothetical ones you assumed existed. If 80% of assessments resolve cleanly with six deterministic rules, and 15% require human judgment on qualitative factors, and 5% involve complex risk modeling across dozens of variables... you've identified where AI might add value (that 5%), and you've avoided building it for the 95% where it creates friction. This approach also builds the labeled training dataset you need: every manually reviewed vendor becomes a training example, and you're collecting data on the actual edge cases that matter, not synthetic scenarios.

Stop Treating AI as a Feature and Start Treating It as a Cost Center

Most product roadmaps list AI features the same way they list UI improvements or API expansions: as discrete capabilities that deliver user value. That framing is wrong for enterprise software. AI isn't a feature — it's a permanent operational expense that compounds over time. Every AI model you ship requires ongoing monitoring, retraining, drift detection, explainability tooling, and incident response when predictions go wrong. According to Forrester's 2024 AI Operations Survey, enterprises spend an average of $180,000 annually per production ML model on maintenance, monitoring, and retraining infrastructure — and that figure excludes the initial development cost.

Rule-based systems have upfront complexity (defining the rules, handling edge cases, building override workflows) but near-zero marginal maintenance cost once deployed. AI systems have back-loaded complexity: they're exciting to build, they demo well, and they fail slowly over months as data distributions shift and model performance degrades. If you can't commit to staffing an ML engineer and a data analyst to maintain the model for the next three years, you're not ready to ship an AI feature — you're accruing technical debt you can't service. Enterprise AI budget ROI adoption failures often trace back to this misconception: teams budget for model development but not for the operational overhead required to keep the model useful.

This doesn't mean "never build AI features." It means: be honest about whether the incremental value AI delivers justifies the long-term operational cost differential versus simpler automation. For vendor risk management, rule-based automation delivers 85% of the time savings, 100% of the auditability, and 10% of the maintenance burden. That's not an argument against AI generally — it's an argument for choosing the right tool for the specific workflow you're automating. David Ohnstad now treats AI features as a separate roadmap category with explicit staffing and budget requirements, reviewed quarterly against simpler alternatives. If you can't articulate why AI is 3-5x better than rules for this specific use case, the feature doesn't pass roadmap review. For more on how product teams can structure these trade-offs effectively, AI & Machine Learning in Enterprise Software provides a deeper strategic framework.

When AI Actually Belongs in Vendor Risk (and It's Not What You Think)

AI makes sense in vendor risk workflows in exactly one scenario: when you're processing continuous telemetry data from vendors post-onboarding, not point-in-time assessments during procurement. Anomaly detection on vendor API usage patterns, network traffic analysis for supply chain security, or behavioral scoring based on vendor support responsiveness and incident disclosure timelines — these are workflows where you're generating thousands of data points per vendor per month, human review isn't scalable, and you're looking for outliers rather than making binary approve/reject decisions.

The product teams shipping AI-powered vendor risk platforms this quarter are mostly solving the wrong problem. They're automating initial assessments (low volume, high stakes, deterministic decision requirements) rather than continuous monitoring (high volume, lower stakes, probabilistic patterns). The irony is that continuous monitoring is where enterprises have the weakest tooling and where AI could deliver genuine differentiation. But it's harder to demo, harder to explain to procurement buyers, and doesn't map cleanly to the "vendor onboarding" workflows that security teams already understand. So vendors build the feature that sells, not the feature that solves the actual operational gap.

What is the biggest mistake teams make when adding AI to vendor risk management tools?

The biggest mistake is automating the initial vendor assessment workflow with AI instead of rule-based logic. Initial assessments are low-volume, require deterministic outputs, and demand full auditability for compliance. AI introduces model maintenance overhead and explainability challenges that exceed the time savings from automation. Rule-based systems handle this workflow more cost-effectively and transparently.

How do you know if your workflow needs AI or just better automation?

Apply the five-gate framework: volume justification, uncertainty tolerance, data sufficiency, failure cost asymmetry, and explainability requirement. If your workflow fails any gate, build a rule-based system first and instrument it thoroughly. Most enterprise workflows that teams assume need AI can be automated with deterministic logic at a fraction of the cost and complexity.

Why do AI-powered vendor risk tools have such low adoption rates after launch?

Users can't explain AI-generated risk scores to auditors or stakeholders, and they can't easily override incorrect predictions without escalating to security leadership. According to Gartner's 2024 research, 72% of AI features in enterprise compliance tools see sub-30% adoption because they add friction to workflows that previously had clear, auditable decision paths. Procurement teams revert to manual checklists that they trust and can defend.

For practitioners: Before you spec an AI feature, write the user documentation explaining how a user would justify the system's output to an auditor or executive. If you can't write that documentation clearly, you're building a tool users won't trust.

For engineering leaders: Budget for the three-year total cost of ownership, not just the initial model development sprint. If ongoing maintenance cost exceeds the operational savings from automation, the feature is net-negative ROI regardless of how well the model performs.

When was the last time you audited whether your team's AI features are solving real workflow bottlenecks — or just adding "AI-powered" to your product marketing deck?

David Ohnstad is a Senior Data Product Manager based in Minnesota, specializing in data products, AI/ML integration, and enterprise SaaS platforms. Follow his work at github.com/davidohnstad40-netizen. For more on David Ohnstad's data product management writing and David Ohnstad's woodworking and making, visit his other sites.