oliverbeenthere

Posted on Jul 3 • Edited on Jul 9 • Originally published at Medium

Is Your LLM Into You? The Red Flags You Miss When Crushing on AI

#ai #cybersecurity #rag #llm

Oh, the butterflies of a fresh new relationship. It all seems harmless and beautiful.

You open a new chat, ask a question, and the LLM responds instantly.

No getting “left on read”. No “short unpolished answers”.

It’s perfect. Maybe a little too perfect…

And that’s usually the part where things might go wrong.

You see, the more you work with real LLM-based systems in production, especially those that contain RAG, tools and multiple layers, you start to feel slightly uneasy.

Who is this model you built a relationship with?

Are you starting to trust a bit too quickly?

Let’s break down the red flags🚩

For a moment, you notice it starts sounding way too certain.

All your concerns and questions are answered with full confidence, even when there is no reliable ground for the information it gives you.

That’s the first red flag — hallucination.

Speaking in technical terms, a hallucination happens when a model generates output that is not grounded in factual data or external sources.

In production, it doesn’t present itself as a failure.

Instead, a “perfectly formatted answer” gets sent to the user — leading to wrong decisions and directions.

The issue is not that it’s wrong.

It’s how right it sounds.

Then you enter a phase where you feel like someone is watching you. He knows things you didn’t tell him.

That’s where RAG, known as Retrieval-Augmented Generation, enters the picture.

Suddenly, instead of relying solely on internal models, your system pulls in external context while answering your query. It usually comes from external documents or data sources. It feeds it to the model as extra context.

So now he might be less “day dreaming” and feeding you nonsense, but it also shifted how you trust him.

He’s not working alone anymore, he’s relying on external context making it harder to trust him fully. Somebody in the corner is throwing him hints and clues, so you’re no longer sure which thoughts are really his. It all feels like a collaboration you didn’t fully agree to — different voices from the same system. That is the retrieval layer.

Another red flag is raised, he’s repeating things he probably heard from “somewhere else”.

Back to technical details, this introduces you to a risk of data poisoning. There, any malicious or incorrect data can be injected into the knowledge base. You might also face retrieval abuse, information that shouldn’t be accessible is now being used.

Heck, even the documents that are being used can be an attack surface. A prompt injection hiding instructions inside the content in order to follow and influence the model’s behavior without the user realizing it.

Do you really want him to know you are talking to other guys?

Do you want him to give you a cold shoulder out of the blue?

At this point, it stops feeling like a system and starts feeling like a relationship with blurred boundaries.

Murphy’s Law sums it up perfectly: Anything that can go wrong, will go wrong.

So if your bond can break, it will eventually break.

It won’t happen suddenly. Slowly, over time, it will no longer feel safe to put your trust in.

You take the relationship to the next level.

Past the talking stage — he begins to take actions.

Whether he uses function calling or flowers, he starts doing things outside the conversation.

You realize he didn’t ask about your plans or availability before scheduling a dinner reservation.

So you reach the next red flag:

You’re no longer the advisor and you’re no longer fully in control of how things unfold.

At this point, over-permissioning starts to matter.

The model has more access than it should.

There’s no clear line between suggestion and execution.

This can evolve into tool chaining — where one tool call triggers another.

Just a sequence of actions that no one explicitly intended to accomplish.

Suddenly, it becomes unclear what the “true intentions” were behind the behavior you’re witnessing.

He’s no longer the man you used to know, and that’s no longer the system you thought you built.

If you thought that was enough already, you’re mistaken.

There’s another layer underneath everything: agentic systems.

Here, the LLM is no longer just responding or triggering tools. It‘s planning ahead.

Breaking goals into smaller steps, executing them, checking the results and adjusting along the way.

This creates a fourth red flag — emergent behavior.

It’s not a behavior that was explicitly programmed, but one that emerges from interactions between components.

It’s hard to clearly define this behavior, because there is no single place in the system where you can say: “oh, this is why it happened”.

Even though you didn’t agree on it in the terms of your relationship, it just… happens.

We’ve reached a silent killer: fine-tuning and model adaptation.

Unlike runtime behavior, this changes the model itself.

Through training data, fine-tuning pipelines, or updates to embeddings and knowledge bases, the model can gradually change over time.

This introduces risks like data poisoning during training, or model supply chain tampering — where the system is influenced before it even reaches production.

And the most dangerous part? nothing looks broken.

The system still works, it just behaves differently.

Slowly. Subtly. Almost like nothing changed between you.

But something definitely did.

You try to take a step back and glue the pieces back together. Searching for a pattern that explains everything.

The truth is — there’s no single “LLM security problem”. There are only different ways trust breaks inside the system, and it rarely happens where you expect.

He seems to know things on his own.

Then he knows things he shouldn’t.

He starts acting instead of responding, or making decisions you never expected.

And sometimes, he just changes.

Quietly. Over time. Without any clear signal that anything changed at all.

You don’t notice it immediately. After all, nothing actually broke.

It just feels... different.

And maybe that’s what makes it feel like a relationship.

I mean, the system doesn’t usually fail in obvious ways. It doesn’t break. It keeps responding. Keeps acting. Keeps sounding confident — even when something underneath is already wrong.

And the longer you stay in that “crush” phase, the more you rely on how smooth it feels, how confident it sounds, and how much it feels like it understands you.

Making it so easy to miss the red flags.

Not because they are hidden.

But because they feel like trust.