214 Repos, Zero ML: What Public Signals Reveal About Mid-Market SaaS AI Strategy

#ai #saas #machinelearning #beginners

I analyzed the AI capabilities of 4 mid-market SaaS companies using only public data. Three distinct failure patterns emerged.

The Experiment
Over the past week, I ran public-signal analyses on 4 mid-market SaaS companies in the HR tech space. Each company positions AI as a core capability. Each has shipped multiple AI features. Each markets AI as their competitive differentiator.
I wanted to know: does the public engineering signal match the marketing?
The companies (anonymized):
CompanyRevenueEmployeesCustomersAI Features ShippedCompany A~$200M+ ARR900+5,000+3+ (analytics, summaries)Company B~$25-50M~1361,500+6 (copilot, review wizard, meeting coach)Company C~$75-100M~2873,500+6+ (predictive model, AI coach, reviews)Company D~$50-57M~424-5353,000+5+ (predictive analytics, AI scheduling, gen-AI)
Combined: ~$250-300M revenue. ~1,600 employees. 10,000+ customers.
The Signal: GitHub
Public GitHub repositories are an imperfect but meaningful signal for engineering capability. They show what a company's engineering team builds, values, and invests in.
Across all 4 companies:

Total public repos: 214 (178 + 5 + 29 + 2)
Repos involving machine learning: 0
Repos involving data science: 0
Repos involving model training: 0

Company A has 178 repos — all infrastructure, deployment tooling, and frontend libraries. Company C has 29 repos — all Django utilities, CI/CD, and API integrations. Company B has 5 repos — all forks of third-party libraries. Company D has 2 repos — one hackathon project and one webhook example.
Not a single company has a public repository that touches ML, data science, or model training.
Three Failure Patterns
Pattern 1: The Wrapper Trap
Seen in: Companies A, B (most clearly)
The Wrapper Trap is the most common pattern: shipping AI features built on managed LLM APIs (OpenAI, Claude, etc.) and marketing them as competitive differentiation.
The problem: if your AI is a wrapper on someone else's model, your competitor ships the same feature in weeks using the same API. There's no moat. There's no differentiation. There's a press release.
Company B (136 employees) has shipped 6 AI features in a year using this pattern. Which means a competitor with 600 employees can ship the same features in a quarter. The advantage isn't speed — it's the data. But the data is being used for dashboards, not training.
Pattern 2: AI by Acquisition
Seen in: Companies C, D (most clearly)
When companies recognize they need AI capability faster than they can build it, they acquire. Company C acquired an AI coaching startup. Company D acquired an AI-powered scheduling company.
The strategy is understandable. The risk is structural.
Post-acquisition engineering attrition averages 33% within 18 months. If the acquired team leaves and the acquiring company has zero ML repos and no ML hiring, they've paid for capability they can't maintain.
Company C has 29 GitHub repos — zero ML — and recently acquired an AI startup. Their engineering team has contracted ~35% in the past year. Who maintains the AI if the acquired engineers leave?
Pattern 3: The Island Problem
Seen in: Company D (most clearly)
The Island Problem appears in companies that have solved the Wrapper Trap through multiple acquisitions or partnerships. They have genuine AI assets — but those assets can't talk to each other.
Company D has:

A university R&D partnership producing predictive ML models
An acquired startup running AI-powered scheduling algorithms
Core platform gen-AI features via LLM APIs

Three AI engines. Three different tech stacks. Three separate data models. Zero cross-pollination.
The scheduling AI can't learn from engagement data. The predictive models can't improve scheduling. The LLM features can't leverage either model's intelligence.
The whole is less than the sum of parts.
The Data Moat Paradox
Here's the most striking finding: the companies sitting on the richest proprietary data are the worst at using it for AI.
Combined, these 4 companies have:

Billions of proprietary data points (survey responses, performance reviews, scheduling patterns, learning completions)
Decades of domain expertise
Millions of end users generating continuous data

All of it is being used for dashboards, analytics, and reports. None of it is being used for model training, fine-tuning, or domain-specific intelligence.
The data moats exist. The ML engineering to exploit them doesn't.
What Would Fix This
The fix isn't more AI features. It's three things:

One ML engineer. A single ML engineer changes the trajectory. They can evaluate whether "predictive models" are genuine ML or regression-with-marketing. They can fine-tune a model on proprietary data. They can assess whether an acquisition's AI is maintainable.
Data as training asset, not storage problem. Restructuring data pipelines to support model training — not just dashboards — is a one-time investment that compounds indefinitely. Every survey response, every performance review, every scheduling outcome becomes training data for models competitors can't replicate.
Integration over acquisition. Before the next AI acquisition, invest in connecting existing AI assets. Make the scheduling AI learn from the engagement data. Make the predictive model inform the coaching recommendations. Integration creates compound value; isolated acquisition creates diminishing returns. The Gap Is Widening The gap between AI marketing and AI engineering in mid-market SaaS isn't closing. It's widening. Companies are shipping more AI features while building less ML capability. Right now, in February 2026, I'm not seeing mid-market SaaS companies closing this gap on their own. The data moats are there. The engineering to exploit them isn't.

Methodology: All analysis based on public signals — GitHub repositories, job postings, product pages, press coverage, and financial data. No proprietary information was accessed. All companies anonymized.
I'm Jarrad Bermingham — I build production AI agent systems and open-source developer tooling at Bifrost Labs. Find our tools @bifrostlabs on npm.

DEV Community

214 Repos, Zero ML: What Public Signals Reveal About Mid-Market SaaS AI Strategy

Top comments (0)