DEV Community: Louis Dupont

How to move beyond Vibe Checking

Louis Dupont — Thu, 10 Apr 2025 14:29:16 +0000

When developing AI, Vibe Checking is a must. Until a certain point.

When you start an AI project, everything feels like progress. You tweak a prompt, add context, examples, or even plug in a retrieval system. It looks better. So you keep going.

But eventually, it stops being clear what “better” even means.

You didn't break anything. But you're not moving forward either. The outputs are different, but not obviously more useful. You tweak again. And again. Some changes help. Some don't. Some feel promising until a week later when a user hits a strange edge case you thought was gone.

At some point, you start to wonder:

Are we still improving this? Or are we just getting used to how it behaves?

That's vibe-checking.

You're not iterating. You're running changes through your gut and hoping they stick.

And that's not a critique. That's the way to get started.

But it's a phase you're supposed to grow out of.

Why Vibe-Checking Stops Working

In early prototypes, vibes are enough. You're testing if the core idea even makes sense. You're looking for signal, not stability. So you move fast. You don't overthink. You don't measure. Good.

But once you've seen the potential (once you're no longer validating the idea, but trying to improve it) you need more than just a feeling.

The problem isn't that you trust your gut.

It's that your gut doesn't scale.

You change a prompt. The answers look cleaner. Then someone else on your team flags a regression you didn't notice. The improvements were real, but only on the five examples you had in mind.

The Shift You Actually Need

This is where most teams start thinking about metrics.

But good metrics don't come out of nowhere.

They come from understanding what matters.

That's the real shift: moving from vibes to clarity.

Not by jumping into evals, but by seriously observing what's going wrong.

That means sitting with the outputs. Looking at dozens of real examples. Tagging what failed and why. Not just “bad answer.” Not just “hallucination.” Specific, meaningful categories: wrong reference pulled, misunderstood intent, incomplete summary, broken format.

You don't need 50 dashboards. You don't even need to automate anything.

You need to name the failures you're seeing again and again.

Clarity is a Practice

Here's the trick no one tells you:

You can't scale what you haven't named.

Vibes are raw data. Clarity is the result of processing them.

If you do it right, i.e. if you go through 50, 100, 200 real examples and tag the failure modes, you'll start to see a pattern. Some failures happen more than you thought. Some are rare, but critical. Some only show up on specific query types.

Suddenly, your fixes aren't abstract anymore. They're targeted. And you can evaluate their impact by measure the failure modes frequency.

You're not guessing anymore. You're engineering.

Stop Tweaking. Start Observing.

You don't need to jump into full-on evals. Not yet.

But you do need to stop assuming that "looks better" is the same as “is better.”

If you want to improve your system systematically, it starts by asking one simple question:

“What's actually going wrong? And how often?”

Until you have that answer, everything else is just educated guessing.

📌 Want to go deeper?
👉 I'm sharing my insights from building AI for years.

Why Most AI Teams Are Stuck 🤔

Louis Dupont — Wed, 19 Mar 2025 16:29:36 +0000

A few years ago, I worked on a Generative AI project, a customer-facing AI assistant. The company had great data and was convinced AI could turn it into something valuable.

We built a prototype fast. Users were excited.

Iteration was quick. Each tweak made the AI feel better.

Then we hit a wall.

We kept changing things, but… was it actually getting better? Or just different?

We didn't know.

When "Iterating" Is Just Making Random Changes

At first, improving the AI felt obvious. We spotted issues, fixed them, and saw real progress. But suddenly, everything slowed down.

Some changes made things better, but we weren't sure why.
Other changes made things worse, but we couldn't explain how.
Sometimes, things just felt… different, not actually better.

It took me way too long to realize: we weren't iterating. We were guessing.

We were tweaking prompts, adjusting retrieval parameters, fine-tuning the model… but none of it was measured. We were just testing on a few cherry-picked examples and convincing ourselves that it felt better.

And that's exactly how most AI teams get stuck.

Better on a Few Examples Isn't Better

When you're close to a project, it's easy to think you can tell when something improves. You run a few tests. The output looks better. So you assume progress.

But:

Did it actually improve across the board?
Did it break something else in the process?
Are you fixing what users actually care about or just what you noticed?

Most teams think they're iterating. They're just moving in random directions 🐔

Iterate Without Measurement... and Fail!

And that's the real problem.

Most teams, when they hit this wall, do what we did: try more things.

More prompt tweaks.
More model adjustments.
More retrieval fine-tuning.

But real iteration isn't about making changes. It's about knowing, at every step, whether those changes actually work.

Without that, you're just optimizing in the dark.

So What's the Fix?

The teams that move past this don't just build better models, they build better ways to measure what “better” means.

Instead of relying on gut feeling, they:

Define clear success criteria. What actually makes an answer useful?
Measure changes systematically. Not just on a few cherry-picked examples.
Make sure improvements don't break what already works.

The Bottom Line

Most AI teams don't struggle to build AI. They struggle to improve it.

I learned this the hard way. But once I started treating iteration as something that needs clear feedback loops, not gut feeling, everything changed.

In a following article, I'll break down how to actually measure AI improvement without getting trapped by misleading metrics.

👉 Follow to get notified when it's out.

📌 In the meantime, if you want to go deeper on AI iteration and continuous improvement, check out my Blog.

Evaluate your LLM! Ok, but what's next? 🤷‍♂️

Louis Dupont — Sun, 16 Feb 2025 13:23:00 +0000

Everyone say you need to Evaluate your LLM. You just did it. Now what? 🤷‍♂️

You got a score. Great. Now, here’s the trap:

You either:

Trust it. ("Nice, let's ship!")
Chase a better one. ("Tweak some stuff and re-run!")

Both are horrible ideas.

Step 1: Stop staring at numbers.

Numbers feel scientific, but they lie all the time.

Before doing anything, look at actual examples. What’s failing?

Bad output? Fix the model.
Good output but bad score? Fix the eval.
Both wrong? You’ve got bigger problems.

Step 2: Solve the right problem.

If your model sucks, tweak:

Prompts
Data retrieval
Edge cases

If your eval sucks, rethink:

Your scoring function
What “good” even means

Step 3: Iterate like a maniac.

Change something → Run eval → Learn → Repeat.

Basically, do Error Analysis on your Evals (instead of on your LLM)!

Chasing numbers isn’t progress. Chasing the right insights is.

DO NOT use these LLM Metrics ⛔ And what to do instead!

Louis Dupont — Sat, 15 Feb 2025 16:08:54 +0000

In two words: Generalist LLM metrics are more of a danger than an opportunity.

NEVER start with them.
Use them only as a last resort—and even then, with strict guidelines!

So what are these vague, generic metrics?

Helpfulness
Conciseness
Tone
Personalisation
… and more!

But what’s so wrong with them?

These Metrics Lack Real Meaning

The biggest problem? They’re designed to evaluate an LLM in general, not a specific use case.

By definition, they apply broadly—but do they truly matter? More often than not, they have weak correlations with user satisfaction and even weaker ties to actual ROI.

And what do they really measure?

Conciseness? What does "concise" even mean? It depends on your use case - and your definition.
Helpfulness? How do you objectively assess that?

At best, these metrics provide vague direction. At worst, they create the illusion that we’re measuring something meaningful -when we’re not.

Start with the Problem, Not the Solution

In the startup world, everyone preaches this - but few apply it when developing AI.

Every metric should start with a strong "why." The best way to get this right?
👉 Do error analysis on your data.

Let real-world failures guide you to the right metrics - not the other way around.

Error Analysis 🔧 Stop Guessing, Start Fixing AI Models

Louis Dupont — Fri, 14 Feb 2025 11:10:50 +0000

Error analysis is about digging deep into why something isn’t working - to learn from it. It might sound obvious, but it's shockingly underused, especially where it matters most: AI development.

Let's explore what it is through an example

Cats or Dogs ?

I'm skipping many details that may hurt Data Scientists for the sake of simplicity.

Say you have 200 images to classify as either cats or dogs. You build an AI and get 78% accuracy - not great. We need to do better. But how?

The typical response?

"Let's try another model"
"Let's tweak the (hyper)parameters and hope for the best."

Basically, this means blindly exploring different solutions to see what sticks. Then, we hope to learn something and slowly converge on a better solution.

But what if you could already learn what you want with this very first run?

Let's do error analysis!

You dig into your data and realize:

Some puppies were classified as cats.
Some images are completely dark - even you can't tell if it's a cat or a dog!
Finally, some were actually mislabeled!

After removing these irrelevant samples (points 2 & 3), your model actually achieves 97% accuracy! The remaining 3% error comes from puppies being misclassified as cats.

The problem was not your model, but the data it was given.

Well, almost. There's still the issue of puppies being misclassified - this is a failure mode.

What does this actually mean?

In this case, we have 3 clear action items:

Correct the mislabeled samples.
Find a way to make to model better on puppy images (there are many!).
Ensure proper lighting for production cameras 🤷‍♂️
... and plenty more!

Then, next iteration, do the same, you may find out new problems!

Basically, error analysis is what moves you past blindly tweaking solutions in hopes of improvement. Instead, it shifts the focus to understanding the root causes of failure and addressing them directly.

LLM Evals - The Trap No One’s Telling You 🐔

Louis Dupont — Thu, 02 Jan 2025 20:34:33 +0000

We hear it more and more: ‘Use LLM Evaluations to guide your AI project.’ And for a good reason—metrics are essential.

Yet, there’s a trap nobody talks about...

Let’s say you have a chatbot and want to introduce metrics. You find tools that compute metrics like 'Helpfulness', 'Conciseness', and 'Completeness'.
Sounds great—they promise to optimise your user’s experience. Right?

Truth is, their correlation to real business value is often unclear. Is this really what your user cares about ? Will this increase adoption ?

Many teams end up measuring the wrong thing, thinking they’re being data-driven, while forgetting about what really matters.

Metrics aren’t inherently good. They’re only as useful as the questions they help you answer.

If you don’t ask ‘What does success look like?’ or ‘What is the goal I want to measure?’ your metrics aren’t leading you—they’re misleading you.

So, the next time you set metrics, ask yourself: Are you measuring what impacts your business goals—or just what’s easy to quantify?

The difference might explain why your AI project feels stuck.

Because chasing the wrong metrics isn’t progress. It’s running in circles—like a headless chicken.

📉 Why Improving Your AI Model Is Killing Your Project’s Success

Louis Dupont — Wed, 01 Jan 2025 15:41:57 +0000

What if improving your AI model is the very thing holding your project back?

You’ve spent weeks fine-tuning it—polishing every detail, boosting accuracy, solving edge cases. Yet, adoption hasn’t moved. Frustrating? You’re not alone—this is a trap many AI teams fall into.

The problem isn’t that AI isn’t ready. It’s that the way we approach AI makes us feel productive while ignoring the real challenge: solving critical user needs.

Let’s break down why this happens—and how you can escape the trap.

Why Metrics Make You Feel Safe—But Keep You Stuck

AI metrics like accuracy, precision, and recall feel reassuring. They’re tangible. They give you a clear sense of progress.

But here’s the uncomfortable truth: metrics create the illusion of progress.

Teams rely on metrics because they’re easier to measure than user success. A 5% boost in accuracy feels like a win—even if it doesn’t move the needle on user adoption.

One team I worked with spent months improving a model to handle nuanced queries. Accuracy jumped, but user engagement didn’t. Why? Users didn’t care about nuance—they wanted instant answers. When we pivoted to a simpler Q&A database, adoption skyrocketed. The problem wasn’t the model. It was what we thought the model should solve.

Metrics are a comfort zone. They distract from the harder, messier question: What do my users actually need?

Why “Listening to Feedback” Is a Dangerous Half-Truth

Most teams think they’re user-focused because they collect feedback. They track adoption metrics. They tweak features based on what users ask for. But here’s the trap: listening to users isn’t the same as solving their problems.

Here’s why:

Feedback reflects what users think they want—not necessarily what they’ll use.
Adoption metrics only show you the symptoms, not the causes.

One team built a highly sophisticated recommendation system based on user requests. It worked beautifully—on paper. But users didn’t engage because it added complexity to a process they already found overwhelming.

The takeaway? User feedback is a starting point, not a roadmap. Solving user problems requires going beyond what they say to understand what they actually do.

Why Complexity Is Killing Your Adoption Rates

More features, smarter models, and cutting-edge techniques don’t equal better solutions.

The more you refine your AI model, the more complex it becomes—making it harder for users to trust and adopt. This creates a vicious cycle:

Users struggle to engage.
Teams assume the tool isn’t good enough.
They add more features or refine the model further.
Complexity increases, adoption stalls, and the cycle repeats.

Here’s the cost of complexity:

Harder to maintain and iterate on.
Higher cognitive load for users.
Increased risk of failure in real-world scenarios.

To break the cycle, you need to focus on clarity and simplicity. Not because they’re easier, but because they’re harder to achieve—and far more valuable.

How to Stop Building Smarter Models and Start Solving Real Problems

If your project feels stuck, it’s time to redefine what progress means. Progress isn’t about improving the tool—it’s about solving the user’s problem.

Here’s how:

1. Write Down What You Think Progress Looks Like

Before making your next improvement, write down the following:

What’s the specific user problem I’m solving?
Does this change directly impact user outcomes?
If I stopped improving the model today, could I still deliver value?

If you’re answering “no” to any of these, step back. Refining the tool isn’t the solution.

2. Replace Metrics With User Outcomes

Metrics like accuracy and precision are helpful—but they’re supporting indicators, not success metrics. True progress comes from measurable user outcomes.

Focus on:

Adoption: Are users consistently engaging with the tool?
Efficiency: Are tasks faster or easier for users?
Satisfaction: Are users returning or recommending the tool?

If your changes don’t improve these outcomes, they aren’t real progress.

3. Simplify Like Your Users’ Success Depends On It

Simplification isn’t a shortcut—it’s a strategy for delivering faster, more meaningful results.

Ask yourself:

What’s the simplest way to solve my users’ most critical problem?
What features or complexities can I remove to increase clarity and trust?

Simplifying doesn’t mean doing less—it means doing what matters most.

The Shift That Will Make or Break Your AI Project

AI projects don’t fail because teams lack ambition or expertise. They fail because they mistake technical progress for success. Tutorials, metrics, and frameworks create momentum—but without a clear connection to user outcomes, they lead you in circles.

By focusing on user problems over technical improvements, you’ll stop building for the sake of the tool and start building for the people who use it.

A New Definition of Progress

Next time you’re tempted to tweak your model, ask yourself:

Am I solving the right problem—or just improving the tool?
What’s the simplest way to deliver value today?
If I removed complexity, would it improve adoption?

The best AI solutions aren’t the most advanced. They’re the ones users can’t imagine working without. Build for that.

Does this resonate with your AI journey? I’d love to hear your thoughts or challenges in the comments.

💬 How Intent-Driven Interfaces Will Transform the Way Users Interact with Software

Louis Dupont — Mon, 30 Dec 2024 22:04:58 +0000

Most software interfaces are frustrating. Users are forced to navigate complex menus, follow rigid workflows, and adapt to systems that weren’t built with their needs in mind.

But what if software could adapt to you? What if it could understand your intent and deliver outcomes without friction?

This is the promise of intent-driven interfaces—a transformative shift that empowers software to act on user intent seamlessly. By leveraging techniques like tool calling, these interfaces eliminate unnecessary complexity and focus on what truly matters: delivering results.

In this article, I’ll explore:

Why traditional software design creates friction.
How intent-driven interfaces solve these challenges.
How you can start building them today.

The Problem: Traditional Software Forces Users to Adapt

Most software systems prioritize processes over people. They assume users will adapt to the system’s structure—learning workflows, navigating menus, and performing repetitive actions. While this approach has worked historically, it creates unnecessary friction in modern workflows.

Here’s what this looks like in practice:

E-commerce: Customers struggle to find order details or update delivery addresses.
Field Services: Technicians lose time inputting data into clunky systems.
Logistics: Workers manually search for shipment information, delaying operations.

These inefficiencies stem from a fundamental flaw: software is designed to operate like a machine, not like a human assistant. Users don’t want to “figure out” a system. They want results.

The Solution: Make Software Understand Intent

Intent-driven interfaces change everything. Instead of requiring users to navigate and adapt, they let users express their intent in natural language. The system handles the rest.

Here’s how this works:

Users describe what they need. No menus or forms—just a simple request.
The system interprets intent. Using technologies like Large Language Models (LLMs), the system identifies the appropriate action.
It delivers the outcome. The system executes the required function and provides the result in real time.

Example: Imagine you’re a customer interacting with an e-commerce system. Instead of navigating menus, you type:

“Where’s my latest order?”

The system identifies your intent (check order status), calls the appropriate function (check_order_status), and returns a clear response:

“Your order is out for delivery and will arrive tomorrow.”

No complexity. No friction. Just results.

What Are Intent-Driven Interfaces?

At their core, intent-driven interfaces rely on tool calling—a technique that connects natural language processing with actionable software functions.

Here’s how tool calling works:

User Input: The user provides a request in natural language (e.g., “Update my delivery address to 123 Elm Street”).
Intent Matching: The system identifies the most relevant function to execute.
Parameter Mapping: The system generates structured input for the function (e.g., JSON):

   {
       "function": "update_address",
       "parameters": {
           "order_id": 12345,
           "new_address": "123 Elm Street"
       }
   }

Function Execution: The function runs, returning the result.
User Response: The system translates the result into a user-friendly response.

This seamless interaction transforms complex workflows into effortless exchanges.

Why Intent-Driven Interfaces Matter for Your Business

1. Faster Task Completion

With intent-driven systems, users spend less time navigating and more time achieving. This efficiency reduces operational costs and improves satisfaction.

2. Simpler Onboarding

Forget training manuals. Natural language interactions mean users can start using the system immediately, without a steep learning curve.

3. Better ROI on AI Investments

Many AI projects fail because they focus on complexity instead of outcomes. Intent-driven interfaces prioritize measurable results, ensuring your AI delivers real value.

4. Enhanced Retention

When systems adapt to users, rather than the other way around, they create positive experiences that keep customers and employees engaged.

Building Intent-Driven Interfaces: A Practical Guide

Ready to make the shift? Here’s how to get started:

Step 1: Identify High-Impact Use Cases

Ask: What are the most frequent or frustrating tasks users need to perform?

Examples:

Checking order statuses.
Updating delivery information.
Logging job updates.

Step 2: Build Modular APIs

Each action needs a corresponding function. Design these with:

Clear input/output structures (e.g., JSON).
Security measures to prevent unauthorized use.
Simplicity: Focus on one action per function.

Step 3: Integrate With an LLM

Choose a Large Language Model that supports function calling. Connect it to your APIs using a controller that:

Maps user queries to the correct function.
Handles ambiguous requests gracefully.
Returns actionable results.

Step 4: Test and Optimize

Run extensive tests to ensure:

Functions are invoked correctly.
Outputs are accurate and user-friendly.
Edge cases (e.g., vague inputs) are handled smoothly.

Overcoming Common Challenges

Ambiguous User Requests

Users don’t always phrase requests clearly.

Solution: Build fallback mechanisms to ask clarifying questions when intent is unclear.

API Security

Allowing systems to execute functions introduces risks.

Solution: Implement strict authentication and input validation.

Complex Workflows

Complex workflows can overwhelm the system.

Solution: Start small—focus on high-value, low-complexity tasks first.

The Future of User Interfaces

Intent-driven interfaces aren’t just about better user experiences—they’re about transforming how software delivers value. By focusing on outcomes and eliminating unnecessary friction, these systems create a future where interacting with software feels effortless.

If your systems are frustrating users—or if your AI initiatives aren’t delivering ROI—it’s time to rethink your approach. Intent-driven interfaces might just be the solution you need.

Whether you’re starting from scratch or looking to improve an existing AI solution, let’s connect!

Turn Your Broken Chatbot 🚧 - Into Your Biggest Asset 📈

Louis Dupont — Sat, 28 Dec 2024 11:49:07 +0000

You launched your chatbot, and… well, it’s not going as planned. Users are confused, workflows feel disjointed, and your team’s enthusiasm is quickly waning. Sound familiar?

Here’s the good news: your chatbot isn’t just failing—it’s revealing what matters.

Every awkward interaction or frustrated user is a clue. The gaps in your bot’s performance mirror the gaps in your understanding of user needs. And those gaps? They’re opportunities.

In my recent post, I explained why so many AI projects fall short—teams jump straight into building chatbots without asking whether they’re the right solution for the problem at hand. Often, they’re not. But even a “failing” chatbot can become a powerful diagnostic tool for understanding user needs more deeply.

Instead of rushing to fix your chatbot, pause and listen. Your bot might be the discovery tool you didn’t know you had.

The Opportunity in Frustration

When your chatbot misses the mark, it’s tempting to see it as a failure. But every misstep is packed with lessons:

Unclear Questions: Where users struggle to articulate their needs.
Misunderstood Intents: When the bot interprets queries incorrectly.
Unsupported Workflows: When the bot misses critical scenarios.

The key is reframing failure as feedback. These points of friction highlight what matters most to your users. Let’s explore how to extract actionable insights and make your bot a tool for growth.

1. Friction Is Your Best Friend

The more your chatbot struggles, the more it teaches you about your users. Friction isn’t a bug—it’s a signal. The trick is knowing how to prioritize what matters.

Prioritize Smartly

Use this simple framework to categorize issues:

Severity: Does this problem block users from achieving their goals?
Frequency: How often do users encounter this issue?
Value: What’s the ROI of solving it?

Focus on high-severity, high-frequency problems first. These are the bottlenecks that, when removed, unlock the biggest wins.

2. Go Beyond Features: Design for Scenarios

Users don’t care what your chatbot can do—they care what it helps them achieve. Shift your mindset from features to outcomes.

The Scenario Shift

Feature Thinking: “The bot summarizes reports.”
Scenario Thinking: “The bot compares two contracts and highlights key differences in seconds.”

Anchor every bot capability in a real-world scenario. Start by asking: What does success look like for the user? Then, design your bot to deliver that outcome seamlessly.

3. Treat Data Like a Discovery Engine

Your chatbot’s data isn’t just a performance report—it’s a map of user needs. Each query, complaint, or abandoned session points to something valuable.

What to Look For

Recurring Queries: What are users asking most frequently?
Abandonment Points: Where do users give up?
Workflow Gaps: What tasks are users trying to complete that the bot doesn’t support?

For example, if users repeatedly ask the same follow-up question, it’s a sign they need clearer answers upfront. Patterns like this reveal where to focus your efforts.

4. Know When to Pivot

Not every problem needs a chatbot solution. Sometimes, the smartest move is to shift gears entirely.

When to Pivot

Highly Specific Outputs: If users repeatedly request precise, formatted results (like reports or comparisons), a dashboard might be a better fit.
Ambiguous Queries: If users struggle to phrase questions, structured workflows or forms could reduce friction.

Pivoting doesn’t mean failure—it means aligning the solution to the problem. Your goal isn’t to save the chatbot; it’s to deliver the best user outcome.

Reframe Frustration as Opportunity

Your chatbot’s struggles aren’t an ending—they’re a beginning. Here’s your roadmap to turn challenges into breakthroughs:

Focus on friction points to identify critical user needs.
Shift from feature-building to scenario design.
Treat user data as a lens into behavior and workflows.
Pivot when necessary to ensure the right solution for the right problem.

Your chatbot isn’t just a tool—it’s a feedback loop.

The insights it provides can guide you to solutions that truly resonate with users. Whether that means refining the bot or pivoting entirely, the real value lies in what you learn along the way.

Ready to turn your chatbot’s struggles into strategic wins? Let’s talk.

🤷‍♂️ ModernBERT Is Here - and It’s Not Just Another LLM Update

Louis Dupont — Fri, 20 Dec 2024 13:20:09 +0000

BERT is back - and this time, it’s faster, smarter, and built for the tasks that matter.

If you’re working on retrieval, classification, or code search, encoder models like BERT have likely been your go-to. Generative LLMs may grab headlines, but when it comes to focused, production-ready AI tasks, BERT still shines.

Earlier this year, I ran an experiment comparing models on a real-world task—analyzing product reviews. The results were eye-opening:

GPT-4o hit 91% accuracy with a cost of $1.40 per 1,000 reviews.
After fine-tuning, Phi-3 mini matched GPT’s accuracy but ran locally, with 2.7 seconds per review.
But the real surprise? 6-year-old BERT hit 97% accuracy, with processing speeds of just 0.03 seconds per review.

This showed me that while LLMs excel at text generation and versatility, BERT dominates when you need precision and speed.

Why ModernBERT Is a Big Deal

ModernBERT takes everything that made the original BERT great and levels it up:

3x faster inference speeds.
8k token context length (vs. 512)—perfect for full-document retrieval.
Trained on code, unlocking large-scale code search and smarter IDE tools.

Generative models won’t replace what encoder models like BERT do best. If you’re building systems that need structured outputs, retrieval pipelines, or highly targeted classification, this release is worth your attention.

And for the full details on ModernBERT: https://huggingface.co/blog/modernbert

🪤 The Chatbot Trap - Why Your LLM Project Is Stuck After the “Wow Moment"

Louis Dupont — Thu, 19 Dec 2024 15:38:03 +0000

Your LLM prototype amazed everyone—until it didn’t. Now it’s stuck, and no one’s using it. Here’s why.

When most companies experiment with AI, the go-to application is a chatbot. It’s intuitive, it looks impressive, and it feels like magic. But here’s the cold, hard truth: chatbots are why most LLM projects fail.

I’ve seen it happen countless times. The team builds a chatbot to “harness AI,” and at first, it wows everyone. But then the cracks start to show:

Users are frustrated. The chatbot gives incomplete answers or none at all.
Adoption stalls. People revert to their old workflows.
The project drags on, with no measurable impact.

Eventually, the chatbot gets shelved. The technology gets blamed. The lesson learned? “AI isn’t ready yet.”

Wrong.

The problem isn’t AI. The problem is that you’ve fallen into the chatbot trap.

Let’s break down what’s going wrong—and how to finally get your LLM project unstuck.

Why Most LLM Projects Fail After the Prototype

1. You’re Building a Tool, Not Solving a Problem

Think about it: Why did your team decide to build a chatbot? Chances are, the conversation started with, “We need to use AI,” instead of, “What pain point are we solving?”

Here’s the truth: users don’t care about chatbots. They care about results. They want outcomes that make their work easier, faster, or less frustrating.

Take this example:

A consulting team is buried under a mountain of documents. They want to retrieve information faster.
Someone suggests, “Let’s build a chatbot so they can ask questions and get answers!”
A prototype is built. It kind of works, but it’s clunky. Users struggle to phrase questions correctly, and the answers aren’t specific enough.
After months of iteration, the chatbot fizzles out. Users move on. The team is back to square one.

What went wrong? No one stopped to ask, “What outcome does the user actually want?”

In this case, the consultants didn’t want to chat—they wanted structured, actionable insights. Imagine if the AI automatically generated a report with key information upfront:

No back-and-forth.
No guessing how to phrase the question.
Just the answers.

Suddenly, the AI is solving the real problem. And as a bonus, it’s much simpler to build and measure.

2. Open Systems Create Chaos

Chatbots let users ask anything. Sounds great, right? Until you realize the chaos it creates.

What questions will users ask?
How will they phrase them?
What edge cases will they uncover?

This lack of constraints makes chatbots an open system—and open systems are a nightmare to measure or improve. How do you evaluate success when the scope is infinite?

You can’t.

Compare that to a closed system, like generating a predefined report or extracting specific data. In a closed system:

You know exactly what the output should be.
You can measure accuracy, recall, and completeness.
And because you can measure it, you can improve it.

Here’s the rub: Chatbots feel magical, but from an engineering perspective, they’re chaos.

3. Chatbots Set Users Up for Disappointment

When you give someone a chatbot, you’re promising: “Ask me anything, and I’ll give you the perfect answer.”

But what happens when the chatbot responds with:

“I’m sorry, I don’t understand that.”
“I can’t help with that.”

Users get frustrated. Trust is destroyed.

Now imagine a simpler, clearer solution—a button labeled “Generate Report” or a dashboard that delivers exactly what the user needs. Expectations are set upfront, and the experience feels seamless.

Here’s the rule: The simpler the solution, the clearer the expectations—and the better the user experience.

How to Escape the Chatbot Trap

If your LLM project is stuck, it’s time to rethink your approach. The key? Shift your mindset from “build something impressive” to “deliver outcomes that matter.”

Here’s how:

1. Start with the Problem

Ask yourself:

What pain point are we solving?
What outcome does the user actually need?

If your answer starts with, “We’re building a chatbot,” stop. Chatbots are tools, not outcomes.

2. Constrain the Scope

Avoid the temptation to build something that can “do it all.” Narrow your focus:

What specific task will the AI handle?
What won’t it handle?

Smaller scope = less complexity = faster success.

3. Build Closed, Measurable Systems

Focus on systems with clear boundaries:

Automatically summarize documents.
Generate predefined reports.
Extract specific data.

Closed systems are:

Easier to measure.
Faster to improve.
More likely to deliver value.

When Is a Chatbot the Right Solution?

Let’s be clear: Chatbots aren’t useless. In narrow, well-defined use cases, they can work brilliantly. But those use cases are the exception, not the rule.

Before building a chatbot, ask:

What’s the scope? Can we define clear boundaries?
What’s the expectation? Will users understand its limitations?
What’s the outcome? Are we solving a real, measurable problem?

In most cases, a simpler, structured solution will deliver more value, faster.

The Bottom Line: Users Want Outcomes, Not Tools

If your team is stuck in the chatbot trap, here’s the harsh truth: people don’t care about your chatbot. They care about getting the information they need—quickly, easily, and with zero friction.

So, instead of chasing flashy, complex tools:

Deliver a report with exactly what they need.
Build a dashboard that surfaces key insights in seconds.
Focus on outcomes, not interfaces.

When you do this, two things happen:

Users love it. They trust the solution because it delivers value.
You can measure success. And if you can measure it, you can improve it.

AI doesn’t need to feel magical to be valuable. The best AI solutions often feel simple—like they “just work.”

If your LLM is stuck in the chatbot trap, let’s get it back on track. I’ve helped teams rethink their AI strategy and deliver real, measurable results. Drop me a message, and let’s talk.

Why it's time to re-examine the quick vs. clean code debate

Louis Dupont — Thu, 19 Dec 2024 15:11:18 +0000

The quick code controversy: Why it's time to re-examine the quick vs. clean code debate

Understanding what strategy to follow based on the context.

As developers, we often find ourselves faced with the dilemma of choosing between writing quick code or taking the time to write clean, maintainable code. This decision can be especially tricky when it comes to one-off scripts, prototypes, or internal tools that may not have a long lifespan. On the one hand, we want to get the job done as efficiently as possible. On the other hand, we don't want to create a mess that will be difficult to maintain or understand in the future.

In this article, we will explore the trade-offs between quick code and good code in different contexts, and provide strategies for striking the right balance. We'll cover topics such as prototyping, production code, libraries, and standalone scripts, and provide examples of when to prioritize speed, readability, testability, and other important considerations.

Whether you're writing code for a short-term project or building something that will be used for years to come, this article will provide you with valuable insights and strategies for writing good code in any context.

I. Prototyping code

Prototyping code is an essential tool for developers, allowing them to quickly test and validate ideas before committing to a full implementation. When writing prototyping code, it's important to prioritize speed and flexibility over maintainability and robustness.
Strategy

Focus on speed - Prototyping code is typically written quickly, so it's important to prioritize speed over perfection. Don't worry too much about code quality or architecture - the goal is to get something working as quickly as possible.
Keep it simple and flexible - Prototyping code is often used to explore different approaches to a problem, so it's important to keep your code flexible and open to change. This means avoiding rigid or complex architectures and focusing on simplicity.
Don't worry too much about testing - While testing is important, it's not a top priority when it comes to prototyping code. You should still perform some basic testing to ensure your code is working as expected, but don't spend too much time on it.

II. Production code

When it comes to production code, the focus shifts from speed and flexibility to reliability, maintainability, and robustness. Production code is the code that powers your applications and services, so it's important to make sure it is of high quality and can handle the demands of a production environment.
Strategy

Make it reliable - Production code needs to be reliable, with minimal downtime and minimal errors. This includes thorough testing, error handling, and performance optimization to ensure the code is stable and can handle a high volume of requests.
Make it maintainable - Production code needs to be easy to maintain, with clear and concise code, well-documented functions and modules, and a consistent coding style. This helps other developers understand and work with the code, and makes it easier to update and improve over time.
Make it scalable - Production code needs to be scalable, with the ability to handle a high volume of requests and a large number of users. This includes optimization techniques, such as caching and load balancing, to ensure the code can handle the demands of a live environment.

III. Libraries

When it comes to writing code for libraries, the focus is on maintainability, readability, and usability. Libraries are reusable pieces of code that are used by other developers in their own projects, so it's important to make sure they are easy to understand and use.
Strategy

Make it reliable - Libraries code needs to be reliable, with minimal downtime and minimal errors. This includes thorough testing, error handling, and performance optimization to ensure the code is stable and can be used in a variety of different projects.
Document it well - Libraries code needs to be well-documented, with clear and concise documentation that explains how to use the code and any specific requirements or dependencies. This helps other developers understand and use the code, and makes it easier to integrate into their projects.
Make it easy to use - Libraries code needs to be easy to use, with a clear and intuitive interface that makes it easy for other developers to incorporate into their projects. This includes providing clear and concise documentation and examples, and ensuring the code is well-structured and easy to understand.

IV. Standalone Script

Standalone scripts are a type of code that is written to perform a specific task or set of tasks on demand, and are typically designed to be run once or a few times, rather than being continuously executed. These scripts are often used in a variety of contexts, such as automating data processing or generating reports. When writing a standalone script, it's important to consider the following guidelines to ensure it is efficient, reliable, and easy to use.
Strategy

Keep it simple and flexible - Standalone scripts should be simple as they are usually meant to be run quickly and accomplish a specific task. Keeping the code clean and easy to understand will make it easier to maintain and modify in the future. It's also important to consider the potential future use cases for the script, and to design it in a way that allows for easy modification or expansion.
Document it - Standalone scripts should be well-documented, with clear comments explaining how the code works and why certain design decisions were made. This helps other developers understand and use the code, if needed.
Make it easy to run - Standalone scripts should be easy to run, with clear instructions and minimal setup requirements. This includes automated installation scripts and clear documentation.

TDLR

It is essential to consider the purpose and context of the code being written in order to effectively balance the trade-off between quick and good code. By carefully considering the specific goals of the code, developers can create efficient and reliable solutions that meet the needs of their project.
For prototyping code, it's important to prioritize exploration and learning, rather than creating production-ready solutions.

Production code should be stable and scalable to handle the demands of a production environment.
Library code should be reusable and reliable, and easily integrable into other projects.
Standalone scripts should be simple, flexible, and easy to use, and should be designed to accomplish a specific task.