Jaskaran Singh

Posted on Apr 25

Six Things I Wish Someone Had Told Me Before I Started Working Inside AI

#ai #promptengineering #rag #llm

I did not plan to work in AI.

For five years I was building apps. The kind people actually download and use every day. That was the job. That was the plan. Then the industry shifted and I started paying attention to how these AI tools actually work underneath, not just what they produce on the surface.

Now my job is to test AI. I talk to it, push it, try to break it, and figure out where it goes wrong. I have spent a lot of time watching AI fail in ways that look completely fine at first glance.

And that changes how you see these tools.

Most articles about AI explain these concepts the way a textbook would. Clean, tidy, no mess. I want to explain them the way I actually learned them. Through the moments they surprised me, confused me, or quietly let me down.

1. Tokens: AI Reads in Tiny Bites, Not Full Sentences

Imagine tearing a book into individual syllables and handing them to someone one at a time. That is roughly how AI reads your text.

AI does not process words the way you do. It breaks everything into tiny fragments called tokens. "Hamburger" might be three tokens. "Cat" is one. Spaces, punctuation, even parts of words all count. Every single thing you type is being measured.

Why does this matter?

Every AI model has a budget. A maximum number of tokens it can handle at once. The longer your message, the more tokens it uses. The longer the conversation, the closer you get to that ceiling. When you hit it, the oldest parts of the conversation start disappearing. The AI is not getting confused. It is literally running out of room to hold everything.

I first noticed this when I kept getting strange, off-target answers in long conversations. The AI was not going rogue. It had simply forgotten what I said at the beginning because there was no room left to keep it.

The fix I started using was to keep messages focused. Instead of dumping everything at once, I ask about one thing at a time. Shorter, more specific messages get better answers. Not because the AI got smarter, but because I stopped wasting its budget.

Try it yourself: OpenAI's Tokenizer Playground lets you paste any text and see exactly how many tokens it uses. Paste a long email or paragraph you wrote. The number will surprise you.

2. Context Window: AI Has a Short-Term Memory Problem

Last year I built a small tool that monitors a government immigration website. Canada's immigration program drops new opportunities without warning. No email, no alert, nothing. Miss it and you wait months. My tool checks the website automatically and sends me a message the moment something new appears.

Early on it had a strange problem.

It would work perfectly for a while. Then it would start acting confused. Alerting me about things it had already reported. Missing obvious updates. Behaving like it had completely forgotten what it was supposed to be doing.

The problem was the context window.

Think of the context window like a sticky note. Everything the AI knows about your current conversation lives on that sticky note. Your questions, its answers, anything you shared. The sticky note has a size limit. Once it is full, the oldest things get erased to make room for new ones. No warning. They just disappear.

My tool was filling up that sticky note with every check it ran. After enough cycles, the original instructions were gone. The AI was working without the information it needed and had no idea.

The fix was simple once I understood the problem. Give the AI only what it needs for each task, not everything that has happened so far. Stop letting the sticky note overflow.

If you have ever noticed AI forgetting something you mentioned earlier in a long chat, this is exactly what happened. It did not ignore you. It ran out of space.

Go deeper: Anthropic's model overview explains how much text different AI models can hold at once. Worth a look if you use AI for anything involving long documents or back-and-forth conversations.

3. Temperature: The Dial Between Predictable and Creative

Every time AI writes the next word in a sentence, it is choosing from a list of possible options. Temperature is the setting that controls how adventurous those choices are.

Think of it like ordering coffee.

Low temperature is the AI playing it safe. Always picking the most expected, reliable option. Like a barista who makes your order exactly the same way every single time. Consistent. Dependable. Occasionally boring.

High temperature is the AI experimenting. It reaches for less obvious choices. Sometimes it surprises you with something genuinely better. Sometimes it goes in a direction you did not want at all.

I tested this once by asking the same question twice. Once with a low setting and once with a high one. The low setting gave me a clear, straightforward answer I could use immediately. The high setting gave me something more interesting but also more unpredictable. For some tasks that unpredictability is exactly what you want. For others, like anything factual or precise, it is a problem.

If you are using AI to write something creative or brainstorm ideas, a higher temperature gives you more variety to choose from. If you need a consistent, reliable answer every time, you want it low. Most AI tools do not show you this setting, but knowing it exists explains a lot about why you sometimes get wildly different answers to the same question.

Quick reference: The Anthropic API docs show how temperature works in practice. Even if you never touch the setting yourself, understanding it makes AI much less mysterious.

4. Hallucination: Confident, Fluent, and Completely Wrong

This is the one I worry about most.

AI makes things up. Not occasionally, not as a rare edge case. It happens regularly. And the scary part is not that it happens. It is that the wrong answers look exactly like the right ones.

I have seen AI recommend a restaurant that does not exist, in a neighbourhood it described accurately, with opening hours and a menu. Completely invented. Presented like a fact.

I have seen it cite a news article with a real-sounding headline, a plausible publication name, and a date. The article never existed.

The reason this happens is worth understanding. AI is not looking things up. It is not retrieving facts from a database. It is predicting what text should come next based on patterns it has seen. Most of the time those patterns align with reality. Sometimes they do not. And the model cannot always tell the difference, so it does not flag it. It just keeps going, confidently.

In my job testing AI I spend a lot of time specifically looking for this. The ones that fool me are not the obviously wrong answers. Those are easy to catch. The dangerous ones are the answers that are almost right. Right enough to pass a quick read. Wrong in a way that only becomes clear later.

The practical lesson is to not trust AI output on anything important without checking it against a real source. Not because AI is useless. Because this is simply how it works right now.

5. RAG: Teaching AI to Look Things Up Before Answering

Going back to my immigration monitoring tool. This is where I really understood what RAG does.

RAG stands for Retrieval-Augmented Generation. Ignore the name. The idea is simple.

Instead of asking AI to answer purely from memory, you give it a way to check a source first. It fetches the relevant information, reads it, and then writes its answer based on what it actually just read. Not what it vaguely remembers from training.

My tool works this way. Every time it runs, it does not ask the AI to remember what the immigration website looked like before. It fetches the current page, hands that fresh content to the AI, and asks whether anything has changed. The AI is reading real, current information. Not guessing from memory.

This is the difference between asking a friend to answer from the top of their head versus letting them check their notes first. Same person. Much better answer.

This matters because AI's knowledge has a cutoff date. It does not know what happened last week. It does not know what is in your company's internal documents. It does not know anything it was not trained on. RAG is how you fill that gap by giving the AI something real to read before it responds.

It is why some AI tools can accurately answer questions about recent events, or about your specific business, when a general AI tool would just guess.

Start here: Anthropic's contextual retrieval post explains RAG in plain language with real examples. LangChain also has a beginner-friendly tutorial if you want to see how it actually gets built.

6. Prompting: The Way You Ask Changes Everything

This one took me longer than it should have to take seriously.

Prompting just means how you phrase your request to an AI. And it changes the answer more than most people realize.

Here is an example anyone can test right now.

Ask AI this. "Write me an email."

Then ask this instead. "Write me a short, friendly email to my landlord asking if I can get a pet cat. Keep it under five sentences. Be polite but direct."

The second one gets you something you can actually send. The first one gets you something generic that you will spend ten minutes editing anyway.

Same AI. Same moment. The only difference is how specific you were.

I learned this from the other side. My job involves writing requests specifically designed to trip AI up and expose its weaknesses. That work taught me something. The more vague your request, the more room the AI has to fill in the gaps with whatever is most common. Most common is rarely most useful.

AI does not read between the lines the way a person does. A colleague who knows you might guess what you mean from a half-finished thought. AI takes your words at face value and produces something plausible for exactly what you wrote. Not more. Not less.

The habit worth building is this. Before you ask AI something, spend thirty seconds making the request more specific. What format do you want? What should it avoid? Who is going to read it? That thirty seconds saves you ten minutes of editing on the other side.

None of this requires a technical background. Tokens are the budget AI works within. Context windows are its short-term memory. Temperature controls how predictable or experimental it gets. Hallucination is confident wrongness that looks right. RAG is the look-it-up approach. Prompting is just asking better questions.

I learned all of this the slow way. By watching things break and figuring out why after the fact. You do not have to.

The people getting genuinely useful results from AI right now are not using better tools. They just understand what is actually happening when they hit send.

I write about AI, what it gets wrong, and how to use it without getting burned. Follow along on dev.to or connect on LinkedIn.

Top comments (1)

JASKIRAN • May 6

This clears up so many of those “why did AI forget what I said?” situations. The context window explanation really made sense to me. And yes, the way you prompt matters more than the tool itself.