Leena Malhotra

Posted on Jan 30 • Edited on Feb 4

Lessons From Building an Internal AI Tool Nobody Used

#webdev #programming #ai #lessons

We spent three months building an AI-powered code review assistant. It could analyze pull requests, suggest improvements, catch potential bugs, and even generate documentation. The demos were impressive. The engineering was solid. The value proposition was clear.

Two weeks after launch, usage dropped to zero.

Not because it was broken. Not because it gave bad suggestions. It just never became part of anyone's actual workflow. The tool worked perfectly—it was just perfectly irrelevant to how our team actually worked.

This wasn't a technical failure. It was a product failure disguised as an engineering success.

The Problem We Thought We Had

The conversation started in a team retrospective. "Code reviews take too long," someone said. "We're spending hours on reviews that could be automated."

It was true. Our team was doing 40+ code reviews per week. Each took 20-30 minutes. Simple math suggested we were spending 15-20 hours per week on something AI could help with.

The solution seemed obvious: build an AI assistant that pre-reviews code before humans see it. It could catch style issues, identify potential bugs, suggest refactoring opportunities. Human reviewers could focus on architecture and business logic instead of nitpicking formatting.

We got approval to spend a sprint on a proof of concept. The POC worked well enough that we got buy-in for a full implementation. Three months later, we launched an internal tool that integrated with GitHub, analyzed every PR automatically, and posted helpful review comments.

The first week, people tried it. The second week, usage dropped by half. By week three, only the team that built it was still using it. A month later, even we had stopped.

What We Built (And Why It Didn't Matter)

The tool itself was good. We used Claude Sonnet 4.5 for code analysis and Gemini 2.5 Pro for generating documentation suggestions. The AI caught real issues—unused variables, potential null pointer exceptions, inefficient algorithms.

We built a clean interface that integrated directly into GitHub PR pages. Reviewers could see AI suggestions alongside manual comments. They could accept AI recommendations with one click or dismiss them if irrelevant.

The engineering was solid. The AI was accurate. The UX was thoughtful.

And nobody used it because we had solved the wrong problem.

The Problem We Actually Had

After the tool failed, I started asking people why they weren't using it. The answers were illuminating:

"I don't mind spending time on code reviews. That's when I learn what the team is working on."

"The AI catches things that don't matter. Unused variables? The linter already shows those."

"I tried it for a week but kept having to explain to the AI author why certain patterns made sense in our codebase."

"Code review isn't slow because we're bad at it—it's slow because we're reviewing a lot of code. We need to write less code, not review it faster."

The pattern was clear: we had diagnosed "code reviews take too long" as a technical problem. It wasn't. It was a communication problem, a knowledge-sharing problem, and sometimes a scope-creep problem.

AI couldn't fix any of those.

The time spent in code reviews wasn't wasted—it was where junior developers learned from senior developers, where architectural decisions were discussed, where context was shared across teams. Making reviews faster would have made the team less cohesive, not more productive.

The Adoption Gap

Even when tools are technically good, adoption requires more than functionality. It requires fitting into existing workflows without friction.

Our AI code reviewer added friction:

It created more comments to process. Instead of reducing review burden, the AI added 5-10 comments per PR. Even when suggestions were valid, reviewers now had more to read, evaluate, and respond to.

It required explaining context the AI didn't have. Our codebase had patterns that made sense given our constraints but looked like anti-patterns to generic AI. Reviewers spent time explaining to the AI (or to other reviewers reading AI comments) why certain code was intentionally written that way.

It didn't integrate with how reviews actually happened. Code reviews weren't just async GitHub comments. They were Slack conversations, pair programming sessions, architecture discussions in meetings. The AI only saw the PR—it missed all the context around it.

It optimized for coverage, not insight. The AI commented on everything it could analyze. Human reviewers were selective—they commented on what mattered. The AI's comprehensive approach made its actually useful suggestions harder to find.

What We Should Have Built

Six months later, after the code review tool was dead, we built something different. Not an AI code reviewer—a tool that helped engineers write better PR descriptions.

The insight came from noticing what actually made code reviews slow: poorly described changes. When a PR explained what changed and why, reviews were fast. When the description was just "fixed bug" or "refactored component," reviews took forever because reviewers had to figure out intent from code alone.

We built a simple tool: before creating a PR, engineers could use an AI writing assistant to draft a clear description based on their commit messages and code changes. The AI would ask clarifying questions: "What problem does this solve? What alternatives did you consider? Are there edge cases reviewers should know about?"

The result wasn't comprehensive analysis of the code—it was a better prompt for human reviewers. And people actually used it, because it made their job easier without adding cognitive overhead.

This tool succeeded because it solved the actual problem: making it easier for reviewers to understand context quickly. It didn't try to replace human judgment—it augmented the information humans needed to exercise that judgment.

The Patterns That Predict Failure

Looking back, there were warning signs we ignored:

We built it because we could, not because anyone asked for it. The team said reviews were slow. Nobody said "we need an AI code reviewer." We invented the solution, then tried to convince people they needed it.

We optimized for demo impact, not daily utility. The tool looked impressive in presentations. It caught bugs, suggested improvements, generated docs. But daily utility isn't about capability—it's about fitting seamlessly into existing workflows with minimal friction.

We measured technical success, not behavioral adoption. We tracked how many PRs the AI analyzed and how accurate its suggestions were. We didn't measure whether people were actually changing their review process or finding the tool useful.

We assumed the stated problem was the real problem. "Code reviews take too long" seemed like a clear problem statement. But it wasn't. The real issues were poor PR descriptions, unclear change scope, and lack of shared context. Code review duration was a symptom, not a disease.

We built for ourselves, then were surprised others didn't adopt it. The team that built the tool used it because we understood its quirks, forgave its limitations, and had context about why certain features existed. Everyone else had none of that context.

What Actually Drives Adoption

After multiple failed internal tools and a few successful ones, patterns emerged about what makes internal AI tools actually get used:

Solve a problem people actively complain about. Not a problem you observe—a problem they articulate. If nobody's asking for a solution, you're probably solving the wrong problem.

Make the first use effortless. If it takes more than 30 seconds to understand value, most people won't bother. Our PR description tool worked because you could try it once and immediately see whether it helped.

Integrate into existing tools, don't create new destinations. People won't add another tool to their workflow. They'll use tools that work where they already are. This is why our GitHub-integrated code reviewer failed but our Slack-based PR description helper succeeded.

Optimize for the median user, not the power user. We built features that power users might appreciate—detailed analysis, customizable rules, comprehensive reports. The median user just wanted their review done faster. Feature complexity drove them away.

Reduce cognitive load, don't add to it. Every AI suggestion requires evaluation: Is this right? Does it apply here? Should I act on it? If you're adding more decisions than you're removing, you're making work harder, not easier.

The Tool That Actually Worked

The internal tool that finally succeeded wasn't the most sophisticated one we built. It was the simplest.

Engineers writing incident reports would paste their rough notes into a text improvement tool that would structure them into clear, concise summaries. No complex analysis. No multi-step workflows. Just: paste messy notes, get clean report.

It worked because:

The need was obvious (incident reports are painful to write)
The value was immediate (clean report in seconds)
The workflow was trivial (paste, click, copy)
The output required minimal editing
It didn't try to replace thinking, just formatting

Usage grew organically. People who saw good incident reports asked how they were written. The tool spread through demonstration, not evangelism.

We later added features: automatically extracting key information from chat logs, generating timeline summaries, suggesting action items. But we added these only after core usage was solid, and only when people explicitly asked for them.

What We Learned About Internal AI Tools

Building successful internal tools requires different thinking than building customer products:

Start with workflow observation, not problem statements. Watch how people actually work. Don't ask them what they need—most don't know. Look for repeated frustrations, workarounds, or manual processes that happen daily.

Build the minimum viable intervention. Don't build a comprehensive solution to a general problem. Build the smallest thing that removes one specific point of friction. Expand only if people ask.

Design for viral adoption, not top-down rollout. The best internal tools spread because people see them being useful, not because they're announced in company-wide emails. Make the value obvious to observers.

Measure usage, not capability. Your AI can be 99% accurate and still be useless if nobody uses it. Track daily active users, retention, and organic growth—not technical metrics.

Accept that most ideas will fail. We built five internal AI tools. One succeeded, one got moderate use, three were abandoned. That's normal. The key is failing fast and learning from each failure.

The Real Lesson

The lesson isn't "don't build internal AI tools." It's "understand the difference between solving a technical problem and solving a workflow problem."

AI excels at pattern recognition, generation, and analysis. But most workflow problems aren't technical—they're about communication, context, coordination, and cognitive load.

Before building an internal tool, ask:

What workflow friction are we actually trying to remove?
Will this tool fit into existing habits or require new ones?
Are we solving a problem people articulate or a problem we observed?
Can we validate value with a manual process before building automation?
What's the absolute minimum version that could be useful?

Use platforms like Crompt AI to quickly prototype and test different AI approaches before committing to building custom tools. The ability to experiment with multiple AI models helps you validate whether AI is even the right solution.

Most importantly: be willing to kill your tools. We got better at building useful internal tools not by making our successful ones more sophisticated, but by abandoning our failed ones faster and learning from why they failed.

The Uncomfortable Truth

The code review assistant we built was technically impressive. The engineering was solid. The AI was accurate. And it failed completely.

Success in internal tooling isn't about building impressive technology. It's about making people's actual work easier in ways they actually care about.

Sometimes that means building AI tools. Often it means building something much simpler that AI happens to make possible. Occasionally it means building nothing at all and accepting that the current workflow, however imperfect, is better than any automated alternative.

The hardest lesson from building an internal tool nobody used wasn't about AI or engineering. It was about product thinking: the solution you can build isn't always the solution people need.

Your job isn't to apply AI to problems. It's to solve problems, and sometimes AI isn't the answer.

-Leena:)