David Ohnstad

Posted on Jun 5 • Originally published at davidohnstad.net

AI Vendor Risk Assessment: Why We Shut It Down

#ai #machinelearning #technology #programming

This article was originally published on davidohnstad.net. I cross-post here to reach the Dev.to community.

We Spent Fourteen Months Building an AI-Powered Vendor Risk Assessment System. Then We Depreciated It.

The request came from the CISO in March 2024: automate our third-party security questionnaires using natural language processing. We had 340 vendors in the compliance queue, each requiring a 90-question security assessment every 18 months. The manual process consumed 11 full-time equivalents across three departments. An AI-powered vendor risk platform, the executive team reasoned, would cut that by 70% while improving response accuracy.

We shipped in May 2025. By October, the compliance team had reverted to their original spreadsheet workflow. The AI model generated assessments faster — but introduced enough low-confidence edge cases that reviewers spent more time validating output than they had spent writing responses manually. According to Gartner's 2024 Third-Party Risk Management Survey, 62% of enterprises implementing AI-driven vendor risk tools report similar outcomes: faster processing, but no reduction in human review hours. We had built a feature the market wanted but our specific workflow could not absorb.

The mistake wasn't technical execution. The model worked. The mistake was skipping the decision framework that would have told us, six weeks into the project, that rule-based automation would deliver 80% of the value at 15% of the cost. David Ohnstad, working on AI & Machine Learning in Enterprise Software product strategy at Veeam, has since built a repeatable process to prevent exactly this kind of expensive misalignment between AI capability and operational readiness.

Why Vendor Risk Management Became the AI Testing Ground — And Why Most Implementations Stall

Vendor risk management emerged as an early AI adoption category for three reasons. First, the volume problem is real: enterprises manage an average of 583 third-party relationships according to Deloitte's 2023 Third-Party Risk Management Survey, with regulatory pressure increasing assessment frequency. Second, the task appears pattern-friendly — security questionnaires repeat similar structures, making them superficially suitable for NLP classification. Third, vendors smell budget: compliance leaders defending their AI adoption roadmaps to boards need a concrete use case, and vendor risk platforms cost $180,000 to $450,000 annually for mid-market deployments, creating a lucrative sales cycle.

But the success rate tells a different story. According to Forrester's Q4 2024 analysis of AI adoption in GRC tools, only 23% of organizations deploying AI-powered vendor risk platforms report measurably reduced review cycle time after 12 months of use. The other 77% report one of three outcomes: stalled adoption (the tool exists but teams revert to prior workflows), scope reduction (AI features are disabled and the platform functions as an expensive database), or abandonment (the contract is not renewed). The pattern David Ohnstad observed in enterprise AI pilots applies here with precision: teams buy the capability without auditing whether their workflow can absorb probabilistic output.

The failure mode is structural, not technical. AI-powered vendor risk tools excel at pattern recognition across large document sets — parsing security questionnaires, flagging anomalies in vendor documentation, surfacing compliance gaps based on historical data. They struggle with edge-case judgment, ambiguous vendor responses, and context-specific risk tolerance thresholds that vary by department. A compliance reviewer reading a vendor's answer to "Do you encrypt data at rest?" can assess evasiveness, probe follow-up questions, and escalate based on the vendor's strategic importance. An AI model returns a confidence score. If your workflow requires nuanced judgment on 18% of assessments — the median figure from our post-mortem analysis — you have not automated the workflow. You have added a preprocessing step that still requires full human review.

The Vendor Risk AI Readiness Framework

This is a four-gate decision model. Each gate is a go/no-go checkpoint. If you cannot answer yes to a gate's criteria, rule-based automation or process redesign will outperform AI implementation. The framework name: the Vendor Risk AI Readiness Framework. It is designed to be applied in 90 minutes with cross-functional stakeholders present — not as a six-month feasibility study.

Gate 1: Volume and Pattern Consistency. Does your vendor assessment workload exceed 200 completed questionnaires per year, and do at least 60% of questions repeat identical or near-identical phrasing across vendor types? If your volume is lower or your questions vary significantly by vendor category, rule-based automation using templated responses and conditional logic will match AI performance at one-tenth the implementation cost. We audited our questionnaire history and found 340 vendor assessments annually, but only 41% question consistency — vendors in healthcare, finance, and infrastructure categories required domain-specific questions that broke pattern recognition models. Gate 1 fail.

Gate 2: Tolerance for Probabilistic Output. Can your compliance workflow absorb answers flagged with confidence scores between 65% and 85% without requiring full manual re-review? AI models perform well at the extremes — high-confidence matches and obvious failures — but vendor risk edge cases cluster in the middle band. If your regulatory environment, audit requirements, or internal risk appetite demand human review of ambiguous responses, you are not eliminating labor, you are redistributing it. Our compliance team's risk tolerance, driven by SOC 2 and ISO 27001 audit requirements, required validation of any response below 90% confidence. In practice, 34% of AI-generated answers fell into that validation queue. Gate 2 fail.

Gate 3: Structured Feedback Loop Infrastructure. Do you have an existing mechanism to capture when the AI model produces incorrect or unhelpful output, and can that feedback retrain the model within a 30-day cycle? According to McKinsey's 2024 State of AI Report, 68% of enterprises deploying AI tools in operational workflows lack the MLOps infrastructure to iterate models based on user corrections. If you cannot close the feedback loop, model accuracy degrades as vendor language evolves, regulatory standards shift, and your internal risk definitions change. We had telemetry on model performance but no process to feed corrections back into training data. Gate 3 fail.

Gate 4: Change Management and User Trust. Have compliance reviewers been involved in defining model behavior from sprint zero, and do they trust probabilistic output enough to act on it without re-validating manually? This is the least technical gate and the most commonly ignored. AI tools inserted into established workflows without user co-design generate resistance — not because the tool is flawed, but because users have no mental model for when to trust it. Our compliance team was consulted on requirements but not involved in iterative model testing. When the tool launched, they treated every AI-generated response as a draft requiring full verification. Gate 4 fail.

Four gates, zero passes. The Vendor Risk AI Readiness Framework would have told us in week six to pivot to a rule-based template system with conditional branching. We would have saved $340,000 in development costs and eight months of roadmap time. The lesson David Ohnstad has carried forward into every AI feature discussion since: volume alone does not justify AI — pattern consistency, tolerance for ambiguity, feedback infrastructure, and user trust are equally determinative. Miss any one, and you are building a feature that will be disabled within a year.

What the CISO Got Right — And What Product Should Have Challenged

Our CISO's instinct was sound: manual vendor risk assessments were a resource bottleneck, and automation was the correct strategic direction. The mistake was in the definition of automation. AI became the default assumption because the sales cycle had primed the executive team to expect it. Every vendor risk platform demo in 2024 featured NLP-powered questionnaire parsing, and the pricing model incentivized the AI tier — $180,000 for rule-based automation, $420,000 for the AI-enabled version. The cost delta created an anchoring effect: paying more signaled a more serious commitment to modernization.

Product management should have challenged the assumption with a forcing question: what percentage of our vendor assessments require judgment that cannot be encoded in rules? We ran that analysis post-mortem and found the answer was 18% — edge cases involving ambiguous vendor answers, vendors in emerging risk categories, or vendors whose strategic importance required elevated scrutiny. For the remaining 82%, rule-based templates could have auto-populated responses based on vendor type, historical answers, and conditional logic trees. A hybrid model — rules for the bulk, human review for the edge cases — would have delivered 70% time savings without introducing probabilistic output into a low-tolerance-for-error workflow.

David Ohnstad now uses this as a litmus test for enterprise AI pilots proof of concept scope: if you can define the edge case percentage and it is below 25%, start with deterministic automation and add AI only where pattern recognition genuinely outperforms rules. The inverse — deploying AI first and discovering the edge case percentage later — produces the outcome we experienced: a technically successful model that operationally fails because the workflow cannot absorb its output.

The Rule-Based Pivot We Should Have Built Instead

Three months after deprecating the AI model, we rebuilt the vendor risk system using conditional logic and templated responses. The architecture was simpler: a questionnaire engine that mapped vendor types to pre-approved answer banks, with flagging rules for responses that required compliance review. Vendors in the "infrastructure" category automatically received pre-written answers to 68 of 90 questions, with 22 questions routed to human review based on the vendor's data access tier. Vendors in "healthcare" received a different template set, and so on.

Implementation took 11 weeks. Cost: $47,000 in development time plus $18,000 annually for the questionnaire platform. Time savings: 64% reduction in compliance review hours, measured over the first six months post-launch. User adoption: immediate. The compliance team trusted the system because the logic was transparent — they could see exactly why an answer was auto-populated or flagged for review, and they controlled the answer bank. There was no probabilistic confidence score to second-guess.

The contrast is instructive. The AI model was more sophisticated, handled linguistic variation better, and impressed stakeholders in demos. The rule-based system was less elegant, required more upfront configuration, and could not adapt to novel question phrasing without manual updates. But the rule-based system matched the workflow's actual tolerance for automation, required no MLOps infrastructure, and delivered ROI in quarter one instead of quarter five. David Ohnstad's guideline: sophistication is not the goal — operational fit is. If a simpler tool delivers 80% of the value at 20% of the cost and integrates into existing workflows without retraining users, it is the correct choice even if it is less technically interesting.

When AI Does Warrant the Investment — Three Counterfactual Scenarios

The Vendor Risk AI Readiness Framework is designed to say no. But there are scenarios where AI-powered vendor risk management clears all four gates and justifies the investment. Here are three, drawn from enterprises David Ohnstad has observed successfully deploying these tools.

Scenario 1: High-Volume, Low-Stakes Screening. A financial services company managing 1,200+ vendors annually uses an AI model for initial triage — not final assessment. Vendors are scored on a 0-100 risk scale based on questionnaire responses, historical audit data, and external breach databases. The top 15% (high-risk vendors) are routed to manual compliance review. The bottom 60% (low-risk vendors with clean histories) receive auto-approval with annual re-assessment. The middle 25% receive a hybrid review: AI-generated summary with human sign-off. This workflow works because the AI is not making final decisions — it is segmenting the queue. The compliance team tolerates probabilistic output in the middle band because the stakes for that cohort are lower, and high-risk vendors always receive full human review.

Scenario 2: Multi-Language, Cross-Border Vendor Populations. A European SaaS company with vendor relationships across 18 countries uses NLP to parse security questionnaires submitted in seven languages. Rule-based automation would require maintaining answer banks in seven languages with regional compliance variations — a maintenance burden that exceeds the cost of the AI model. The AI tool translates, normalizes, and scores responses, then routes ambiguous answers to regional compliance leads. This works because the alternative — hiring multilingual compliance reviewers or outsourcing translation — costs more than the AI platform, and the linguistic complexity genuinely exceeds what deterministic rules can handle.

Scenario 3: Continuous Vendor Monitoring, Not Just Periodic Assessment. A healthcare technology company uses an AI model to monitor vendor risk signals continuously: breach disclosures, changes in security certifications, regulatory actions, leadership turnover, financial distress indicators. The model ingests external data feeds and flags vendors whose risk profile has shifted since their last formal assessment. This is not a questionnaire automation problem — it is a signal aggregation problem across unstructured data sources that update asynchronously. AI's pattern recognition across large, noisy datasets justifies the cost because there is no rule-based equivalent. The compliance team acts on flags, not on auto-generated assessments, so the tolerance for probabilistic output is higher.

All three scenarios share a common structure: AI handles volume, ambiguity, or unstructured data aggregation, but humans retain decision authority. The failure mode David Ohnstad observed — and that Forrester's data confirms — occurs when AI is expected to replace judgment entirely. Successful implementations use AI as a preprocessor, a triage tool, or a signal aggregator, not as the final compliance decision engine. This maps directly to lessons from enterprise AI budget ROI adoption planning: cost justification depends on whether AI eliminates a bottleneck that deterministic tools cannot address, not on whether AI performs the task faster.

The Budget Conversation Product Managers Should Force Before Design Begins

Here is the question David Ohnstad now asks in every AI feature kickoff: if this model achieves 90% accuracy, what happens to the 10% of cases where it is wrong? Can the workflow absorb errors at that rate, or does a single false positive create compliance risk, customer impact, or audit exposure that negates the efficiency gain?

For vendor risk management, the answer was clear in hindsight: a single incorrect risk assessment that allowed a non-compliant vendor into the supply chain could trigger audit findings, regulatory penalties, or breach liability that exceeded the entire cost of manual review. The risk asymmetry made AI a poor fit. A slower, human-reviewed process was preferable to a faster, probabilistic one because the downside of error was unacceptable.

This is the forcing function that should gate AI investment decisions — not market trends, not vendor demos, not the sophistication of the model. If you cannot define your error tolerance and map it to model accuracy, you are not ready to build. Product managers who skip this step and justify AI features based on capability rather than operational fit produce exactly the outcome we experienced: a feature that works technically but fails operationally because the organization cannot absorb its output.

The budget conversation should also include the total cost of ownership beyond the initial build. According to IDC's 2024 AI Infrastructure Survey, enterprises underestimate AI operational costs by an average of 240% in year one. Model retraining, MLOps infrastructure, data labeling, and user retraining consume more budget than the initial development cycle. For the vendor risk project, our $340,000 development cost would have required an additional $180,000 annually in model maintenance and retraining infrastructure — costs that were not in the original business case because we assumed the model would perform stably without continuous iteration. That assumption was wrong. Vendor language evolves, regulatory standards change, and internal risk definitions shift quarterly. A model that is not continuously retrained degrades in accuracy, and degradation in a compliance context creates liability.

What This Means for Product Teams Evaluating AI Features in Q2 2026

Q2 budget reviews are underway, and product teams inheriting AI roadmaps mid-year are asking the right question: does this feature justify the cost, or are we building it because the market expects it? The Vendor Risk AI Readiness Framework applies beyond vendor risk — it is a generalized decision model for any AI feature in an enterprise workflow.

Run the four gates before committing resources. Gate 1: volume and pattern consistency — does the problem involve enough repetitive data that pattern recognition outperforms rules? Gate 2: tolerance for probabilistic output — can your workflow absorb confidence scores between 65% and 85% without requiring full manual review? Gate 3: feedback loop infrastructure — can you capture model errors and retrain within 30 days? Gate 4: user trust and change management — have end users co-designed the feature and will they act on its output?

If you cannot pass all four gates, you are not building an AI feature — you are building an expensive science project that will be deprecated within 18 months. The discipline David Ohnstad has carried forward from the vendor risk failure is this: AI is a tool that solves specific problems where deterministic approaches fail. It is not a strategy. It is not a signal of technical sophistication. It is a conditional solution whose cost must be justified against simpler alternatives.

For product managers entering the enterprise SaaS space in 2026, this is the skill that will separate high-performing teams from those that burn budget on features users disable: the ability to say no to AI when rules-based automation delivers equivalent value at lower cost and higher reliability. It is not the flashy position. It will not win you a spot on a conference panel about current AI adoption. But it will keep your product roadmap focused on outcomes users care about — and that is the definition of product management maturity.

How do I know if my vendor risk workflow is ready for AI automation?

Apply the four-gate readiness framework: evaluate volume and pattern consistency, assess your tolerance for probabilistic output, confirm you have feedback loop infrastructure to retrain models, and verify user trust through co-design. If you cannot pass all four gates, rule-based automation will deliver better ROI than AI-powered tools.

What is the difference between AI-powered and rule-based vendor risk management?

AI-powered tools use machine learning to recognize patterns in unstructured data and generate assessments with confidence scores. Rule-based systems use conditional logic and templated responses that produce deterministic outputs. Rule-based systems are simpler, more transparent, and higher-trust in low-tolerance-for-error workflows. AI excels when volume, linguistic variation, or unstructured data exceed what rules can handle.

Why do most AI vendor risk implementations fail to reduce review time?

According to Forrester's 2024 GRC analysis, 77% of AI vendor risk deployments stall because workflows cannot absorb probabilistic output. Teams revert to manual review of AI-generated responses, adding a validation step instead of eliminating labor. Success requires matching AI output confidence to organizational risk tolerance — not just deploying the tool.

Practitioner Takeaway: Before you spec an AI feature, define your edge-case percentage and your error tolerance. If edge cases exceed 25% of workload or a single error creates unacceptable risk, start with deterministic automation and add AI only where pattern recognition genuinely outperforms rules. Sophistication is not the goal — operational fit is.

Leadership Takeaway: Stop approving AI features based on capability demos. Require product teams to pass a four-gate readiness framework that evaluates pattern consistency, error tolerance, feedback infrastructure, and user trust before committing budget. AI that cannot integrate into existing workflows without retraining users will be deprecated regardless of technical performance.

When did you last audit whether your AI roadmap is driven by operational necessity or by vendor sales cycles? Visit David Ohnstad's data product management writing and David Ohnstad on leadership and career growth for frameworks on evaluating AI investment decisions with practitioner-grade rigor.

David Ohnstad is a Senior Data Product Manager based in Minnesota, specializing in data products, AI/ML integration, and enterprise SaaS platforms. Follow his work at github.com/davidohnstad40-netizen.

DEV Community