After sitting through more enterprise AI procurement processes than I can count on both sides of the table, the same pattern keeps showing up. The organizations that end up with AI deployments that actually work asked harder questions earlier. The ones that end up with expensive regrets asked easier questions and felt good about the answers.
Here are the twelve questions that I have watched separate the two groups. Not because the questions are complicated, but because asking them sincerely and demanding real answers takes a kind of organizational discipline that most procurement processes do not have.
On data handling:
"Walk me through exactly what happens to my data between when an employee submits a query and when they receive a response. Which servers does it touch, which external services are called, and what is retained after the response is generated?"
This question has a specific correct answer and a specific evasive answer. The correct answer names the infrastructure components, the services, and the retention periods. The evasive answer talks about enterprise agreements, security certifications, and data privacy commitments without describing the actual data flow. If you get the evasive answer, ask the question again more specifically. If you still get the evasive answer, you have learned something important.
"What is your complete subprocessor list, and what data handling commitments does each subprocessor have?"
Most enterprise AI vendors use external LLM inference APIs, external embedding APIs, external monitoring services, and cloud infrastructure from major providers. Your data processing agreement is with the vendor. Your data flows through systems governed by agreements you have not seen. Ask for the complete list and verify that the protections cascade.
"If we need to delete all data associated with our organization from your systems, including any derived data, cached prompts, or inference logs, what is the process and how long does it take?"
Deletion is harder than it sounds in distributed cloud infrastructure. Backups, caching layers, inference logs, and derived analytics can all hold your data beyond the obvious application layer. Understanding the deletion process before you need it is significantly more useful than understanding it during a GDPR deletion request.
On product reliability:
"Show me what the AI says when it doesn't know the answer, or when the retrieved information is insufficient to answer confidently."
This is the honesty test. Ask the vendor to demonstrate this live, with a question you know is not in their demo dataset. The tools that say "I don't have reliable information about this" or "the sources I found are from 2021 and may be outdated" are the tools that are calibrated to be honest about their limits. The tools that generate a fluent, authoritative-sounding answer to a question they have no basis to answer are the tools that will create trust-destroying errors in production.
"What is your retrieval accuracy on documents that were uploaded six months ago versus documents that were uploaded last week?"
Retrieval quality for recently indexed content is almost always better than for older content, because document metadata, embedding quality, and index optimization often improve over time. If the vendor cannot give you a concrete answer about how freshness affects retrieval quality in their system, they have not measured it. If they have not measured it, they do not know where the quality floor is in production deployments.
"Tell me about a production incident you have had in the last twelve months where AI outputs were incorrect in a consequential way. What happened and what did you change?"
Every AI tool in production has had incidents where the output was wrong in a way that mattered. The vendors who have thought seriously about reliability can tell you about these incidents clearly and describe what they learned. The vendors who have not thought seriously about it will tell you they have not had any meaningful incidents, which is almost certainly not true.
On organizational fit:
"What does the customer relationship look like in month eighteen, not in month three?"
The month three relationship is the vendor's best foot forward: attentive account management, responsive support, proactive check-ins. Month eighteen is when you have moved from new customer to established customer and you have real experience of how the vendor treats accounts that are not in the critical adoption period. Ask for references who are eighteen months or more into the deployment and specifically ask those references what changed in the relationship over time.
"What is the renewal price if our usage has grown by 50% over the first year?"
Usage-based pricing is not inherently bad. Surprise usage-based pricing at renewal is. Get a specific number for the growth scenario you actually expect before you sign the initial contract. If the vendor says the renewal pricing depends on factors they cannot estimate until renewal, that is information.
"Who in your organization is accountable if this deployment is not delivering value twelve months from now?"
This question is as much for your organization as it is for the vendor. The right answer internally is a specific person with a specific metric, not "we are all accountable for the success of this initiative." Diffuse accountability for AI deployment outcomes is one of the strongest predictors of underperformance.
On the exit scenario:
"What does leaving look like? What can we export, in what format, and what happens to the workflows we've built on your platform?"
Ask this question early and watch the reaction. Vendors with clean exit paths are not afraid of this question. Vendors whose business model depends on switching costs become evasive about it. The answer tells you how the vendor thinks about the relationship.
"If your company were acquired tomorrow, what would happen to our contract, our data, and our service terms?"
Acquisition risk is real in the current AI market. Many of the vendors you are evaluating will not exist in their current form in three years. They will be acquired, pivot, or shut down. Understanding the contractual protections you have in these scenarios before you commit is meaningfully different from understanding them after the acquisition has happened.
On the realistic case:
"What does a typical deployment look like for a company our size, and what are the two or three things that most commonly prevent organizations like ours from reaching their projected ROI?"
This question reveals whether the vendor has an honest model of why their product sometimes fails. The vendors who can answer it clearly and specifically, naming the actual reasons deployments underperform, are the vendors who have learned from deployment experience and built that learning into their sales process. The vendors who cannot answer it, or who answer with vague references to change management or data quality without specifics, are the vendors who either have not had enough failed deployments to learn from or are not being honest about the ones they have had.
The purpose of these questions is not to be difficult. It is to do the evaluation work that vendor demos are not designed to help you do. A compelling demo answers the question of whether the technology can work in optimal conditions. These questions help you understand whether it will work in your conditions, with your data, over your timeline, with your organizational constraints.
The organizations that ask them tend to make better decisions. The ones that do not tend to end up explaining to boards why the AI initiative that looked so promising at the demo stage is not delivering what was projected.
Ask the questions. Get real answers. Make better decisions.
Top comments (0)