If LLMs Were ATMs, Would You Still Count Your Money?

#ai #rag #llm #programming

There is this joke about some Nigerians still counting their money at the ATM. Even with an automated system, people in my home country, Nigeria in West Africa, perform this common practice of counting their cash the moment it comes out of the machine. I have never personally known anyone who got a different amount than what they asked for, but the habit carries on. And I think that habit has something to teach us about how we should be treating LLMs right now.

The buzz word everywhere you turn is AI, and many companies you know of want to quickly get on this new trend. As we rush to embrace this tool to become more productive and make our businesses more efficient, it falls on us as engineers to check what the output of this technology actually gives us. Do not vibe code to the extent that you no longer have control over what is being written. We should learn to do some extra checks, the same way people still count their notes at the ATM even when the machine has never shortchanged them. Trust is earned, and verification is cheap insurance.

So what does "counting your money" look like when the machine is an LLM?
First, what if I told you that agents are simply a set of continually prompting an LLM to do something, sending the output back to itself or to a different, more capable model as a prompt, and performing other actions such as criticism and review? Once you see it that way, it becomes obvious that every step in that chain is a place where things can quietly go wrong. You need checks at each step, not just at the end.

Second, LLMs are still unreliable in some vital areas. We need to put a system and structure in place, with a set of hard rules that the LLM's output is run against. These are not prompt rules, I mean a programmatic set of rules enforced in code. Take access checks as an example. Do not just prompt, "answer this question if the user is an admin". The LLM can fall into prompt injection attacks, where a user slips instructions into their input telling the model to ignore the admin check entirely. Access control belongs in your code, not in your prompt.

Third, evaluate the model's response regularly. When you change models or change any parameters in your pipeline, run the same checks each time to ensure the model has not degraded or drifted from expected or acceptable answers. This can be done with different evals.

And finally, if you vibe code a product, please get a Software Engineer to review it before you hand it over to customers, whether paying or non-paying. Many of the topics I mentioned above can be very technical, and you need someone who knows how code should behave to reliably implement them. Businesses should not be too eager to replace engineers.

The ATM in Nigeria has been around for years, and it still gives people the right amount almost every single time. And still, we count. That instinct is not paranoia, it is discipline. LLMs are newer, less predictable, and far more capable of being wrong in ways you will not notice at a glance. So, count the money, review the code!

DEV Community

If LLMs Were ATMs, Would You Still Count Your Money?

Top comments (0)