Every week, I talk to engineering leads who've been handed the same brief: "Figure out where we should be using AI."
It sounds like a strategy question. It's actually an engineering question in disguise — because the right answer depends on factors that non-technical leadership can't easily evaluate: data availability, task structure, output verifiability, integration complexity.
Here's the framework we use to answer it systematically.
The 4-Dimension Workflow Scoring Matrix
For any workflow you're evaluating, score it on these 4 dimensions on a 1–5 scale:
workflow_score = {
"task_structure": 0, # How structured and repeatable is the task?
"data_availability": 0, # Do we have historical data to train/guide the AI?
"output_verifiability": 0, # Can we verify if the AI output is correct without deep expertise?
"volume_frequency": 0, # How often does this task occur?
}
# Scoring rubric
TASK_STRUCTURE_GUIDE = {
1: "Highly ambiguous — requires accumulated domain judgment",
3: "Mixed — structured core with some judgment calls",
5: "Fully structured — clear decision criteria or rule-based."
}
DATA_AVAILABILITY_GUIDE = {
1: "No historical data exists",
3: "Some data, partially structured",
5: "Rich historical data, well-labelled, production-quality."
}
OUTPUT_VERIFIABILITY_GUIDE = {
1: "Near impossible to verify without deep domain expertise",
3: "Can sample-review ~20% of outputs",
5: "Automated verification possible (format, schema, rules)."
}
VOLUME_FREQUENCY_GUIDE = {
1: "< 10 instances/month",
3: "50–200 instances/month",
5: "> 1000 instances/month or continuous."
}
# Weighted final score
def ai_readiness_score(scores):
weights = {
"task_structure": 0.30,
"data_availability": 0.30,
"output_verifiability": 0.25,
"volume_frequency": 0.15
}
return sum(scores[k] * weights[k] for k in scores)
# Interpretation
# 3.5+ → Strong AI candidate. Start here.
# 2.5–3.5 → Viable, but address the low-scoring dimension first
# < 2.5 → Not yet ready. Data or structure work is needed before AI.
Example: Scoring 5 Common Workflows
Workflow | Structure | Data | Verify | Volume | Score
---------------------------|-----------|------|--------|--------|------
Invoice data extraction | 5 | 4 | 5 | 4 | 4.55 ← Start here
Customer support triage | 4 | 4 | 3 | 5 | 3.85 ← Strong
Employee onboarding Q&A | 4 | 3 | 4 | 3 | 3.55 ← Good
Contract risk flagging | 3 | 3 | 2 | 3 | 2.80 ← Later
Strategic pricing decisions| 1 | 2 | 1 | 2 | 1.45 ← Not AI
Invoice extraction and support triage score the highest. Contract risk flagging is interesting, but the output verifiability gap makes it premature. Strategic pricing stays with humans.
The Pre-Build Data Readiness Checklist
Before scoping any AI implementation, verify:
data_readiness = {
"volume": "500+ historical examples of the task completed?",
"quality": "Historical data consistent and trustworthy (not riddled with exceptions)?",
"labels": "Do we have correct outputs for historical inputs (for supervised approaches)?",
"recency": "Data representative of current conditions (not stale from 2+ years ago)?"
}
# Any 'No': address the data problem before scoping the AI.
# AI quality is bounded by training/retrieval data quality — no model selection fixes bad data.
The Two Infrastructure Questions Nobody Asks Until Week 8
Where does the AI output go?
If outputs need to integrate with a legacy system (SAP, Salesforce, or an older ERP), the integration layer is typically 40–60% of the total build effort. Map downstream systems before scoping the AI component.
What happens when the AI is wrong?
Every AI system has a failure rate. Define your acceptable error rate before building. What does a wrong output cost? Can you spot-check a sample? How do you handle low-confidence outputs? These are week-1 architecture decisions, not week-8 discoveries.
The One-Page Prioritisation Output
After scoring every workflow:
AI READINESS SCORECARD — [Company Name]
=========================================
TOP 3 AI PRIORITY WORKFLOWS:
1. Invoice Data Extraction Score: 4.55 ROI: HIGH
Data ready: YES Integration: 3 weeks First value: 4 weeks
2. Customer Support Triage Score: 3.85 ROI: MEDIUM-HIGH
Data ready: PARTIAL Gap: Need 200 labelled examples
3. Employee Onboarding Q&A Score: 3.55 ROI: MEDIUM
Data ready: YES (Notion + Confluence) Integration: 2 weeks
NOT YET READY (address these data gaps first):
- Contract Risk Flagging (output verifiability gap)
- Strategic Pricing (judgment-based, insufficient structure)
Run this before you talk to any AI vendor. Know your scores before you know the solution.
Top comments (0)