Mohammed Farmaan.

Posted on Jan 5

Working with LLMs: A 50/50 Effort.

#llm #ai #softwareengineering #tutorial

This article was originally published on my personal blog: https://farmaan.dev/writing/working-with-llms/

Hiee! Welcome to another post. This time, it’s not me ranting about things or people I don’t like, but about one of my favorite things: tech. I’m hoping to write a practical guide on working with LLMs.

And this post is not how a typical LinkedIn-AI-Grifter portrays working with AI looks like (getting 100 things done at once by running 25 agents in parallel or running 50 agents in the background to make a billion-dollar SaaS), which I know sounds ridiculous, but they've been doing this for a while now, just to farm engagement, of course.

This post is as real as it can be and is about working with Large Language Models every day as a software engineer. If put in a fancier way, it's about working with AI. So let's get going, shall we?

Okay, so it seems quite easy to work with LLMs, right? All you have to do is just prompt and it will get the job done. Well, that's true, but only if you want mediocre results.

A wise man once said: "A human can give you mediocre results, but a human using AI wrong can give you worse - me 2025"

Anyway, enough small talk. Let's see how we can actually work with LLMs the way God intended. I have a few topics in mind which I believe cover the entire arc of this post. So let's start with "prompting":

Prompting: Mediocre, Detailed & Refined.

Everything starts with a prompt when working with LLMs, and it shouldn't be a surprise that it's the prompt itself that can make or break the end result. And IMO, there are three types of prompting: "mediocre, detailed & refined." I'm not going to be extremely detailed about these because they're already descriptive enough.

Anyway, as the name suggests, a mediocre prompt is the most low-effort sentence that you can feed to an LLM, assuming it will just somehow read your mind and give you the right results. Here's an example of a mediocre prompt in the context of software engineering:

Add chat reactions in the chat section.

That's it. That's the prompt. You're basically telling the LLM to do the most out of these few words, which means it not only has to do the extra work of gathering the context, but also means an over-engineered, mediocre end result. You can argue that isn't the whole point of LLMs? Making the most out of as little effort as we can? Well, yes, but not in the context of software engineering.

The same can be true if it were a different context, such as writing an email for sick leave. It will do the job effortlessly because it doesn't need context there. It's a simple thing, and LLMs are already good at these.

Anyway, the next one is a "detailed" prompt. As the name suggests, these prompts are usually detailed. Not bragging, but since I also write in general, I'm pretty good at writing detailed prompts and I'm also well aware of how things work in my codebase. So I can almost always write a prompt that makes an "LLM feel at home," which is quite important because the last time I heard, they have feelings too, so please make them feel at home.

Here's an example of a detailed prompt:

Add chat reactions in the chat section. Here's the relevant context for you to work with: @app/chat @api/chat

The chat section is a React component that displays a list of messages. Each message has a timestamp, user avatar, and message content. I need to add a reactions feature that allows users to react to messages with emoji (like 👍, ❤️, 😂). The reactions should:

1. Be displayed below the message content
2. Show the count of each reaction type
3. Allow users to click to add/remove their reaction
4. Use the existing API endpoint POST `/api/messages/{id}/reactions`

Look at how the message component is structured and add the reactions UI alongside the existing reply button. Keep the styling consistent with the current design system (using Tailwind classes from the rest of the component).

Great difference right? Yea sure, but you know what? Sometimes even detailed prompts aren't good enough. And that's where the third type comes in: "refined" prompts. A refined prompt is basically a detailed prompt that has been fed back through an LLM to make it even more specific and clear. I know it sounds a bit weird, but here's how it works:

I'm just going to feed the same detailed prompt from above, and here's how the refined version looks:

Add emoji reactions to the chat section. Relevant modules: **@app/chat**, **@api/chat**.

The chat section is a React component that renders a list of messages. Each message currently includes a timestamp, user avatar, and message content. Extend the message UI to support **emoji reactions** (e.g., 👍 ❤️ 😂).

## Requirements

1. **Display reactions** directly below the message content, aligned with the existing layout.

2. Each reaction should show:
   * The emoji
   * A count of how many users selected it

3. Users should be able to **toggle** each reaction:
   * Clicking adds their reaction if not present
   * Clicking again removes their reaction

4. Use the existing API:
   * `POST /api/messages/{id}/reactions`
   * Request body should include the emoji/reaction type and user ID if required

5. Integrate reactions in the message component **next to the existing reply button**.

6. Follow the current design system:

7. Make sure the component:
   * Use Tailwind classes consistent with the surrounding component styles.
   * Updates optimistically on click
   * Handles reaction updates coming from the server (if applicable)

## Additional guidance

* Review the structure of the `Message` component to understand where to insert the reactions block.
* Ensure the reactions area is compact and visually consistent with the rest of the message UI.
* Use a small set of common emoji reactions (hardcoded list is fine unless otherwise configured).

See the difference? The refined version is way more specific about what the LLM needs to do, where it needs to look, etc. So that's how a refined prompt can make the LLM perform much better. Anyway, assuming the importance of prompting is now clear, let's move on to the next section "context":

Context: Manual vs. Automatic.

A wise man once said, "Context is King." Except this time, the wise man is not me. But anyway, if you're doing any serious work, context pretty much means everything when working with an LLM. Without it, you will just end up with half baked answers, overly confident wrong results, and a tangled mess.

While I have to say that LLMs have gotten pretty good at grepping context automatically, feeding them context manually still goes a long way. Especially when you want it to make fewer mistakes and create less mess which means more accurate results and reduced token usage. Manual context also makes the output more predictable, which matters a lot when you are building or making changes to real systems.

Since I'm writing this in the context of software engineering, there are multiple ways you can feed context to an LLM. The easiest way to do so is to just attach the relevant files, folders, or docs and let the LLM derive the context from it.

This works well because the LLM now has direct access to files and folders it needs to work with so it will give you much more accurate results. In this case, you are telling the LLM to bail out on guessing and focus solely on results based on the given context which means reducing the scope of the errors upfront.

The other way to feed context is mostly verbal. So instead of attaching relevant files and folders manually, we just verbally tell the LLM where and which parts of the codebase to look.

Now since LLMs are already good at grepping context automatically, your instructions help it further and helps it produce better results. This approach trades some certainty for speed and flexibility, and works well when you are still exploring or are not fully sure what matters yet.

The latter is what most vibe-coders use because they can't code or don't know what's in their codebase yet. In practice, you will almost always end up using a mix of both. Explicit context when correctness matters the most, and auto-discovery when you just want to get the job done without hand-holding the LLM on every step.

As an engineer, you should know that you can only get so far with only one of these, so it's always a good thing to use both depending on the context.

Behavior: Configuration + Steering.

Configuration

As the name suggests, this involves configuring an LLM using a config file. Don’t confuse this with configuring LLM inference parameters, which is a completely different kind of configuration that end users usually don’t touch.

The type of config I’m talking about is more like rules. These act as a middle layer and affect the LLM’s final output. They tell the model not to make changes based purely on its own conclusions, but to follow a defined set of guidelines whenever it needs to change something.

Now there are multiple ways this can be configured, and LLM providers have tried various approaches, including a markdown file that corresponds to their LLM, such as CLAUDE.md, GEMINI.md, AGENTS.md, and so on. If you're using an editor like Cursor, it has .cursor/rules/**.md. I'm just waiting for the day when we can all decide on a single format and call it a day.

I still believe AGENTS.md is the way to go, but for now let's focus on rules, which in the end is just a markdown file with a set of rules for the LLM to follow before it makes a change. This section is quite descriptive, so let's look at a simple example where we make an LLM perform the same task with and without the config.

Imagine you're working with a React codebase and you ask an LLM to add a new button component. Without any config, the LLM might create a button with inline styles, import unnecessary dependencies, and not follow your project's naming conventions. Here's what you might get:

// (Button.tsx) Without config - mediocre result
export const Button = ({ text }) => {
  return (
    <button style={{ padding: '10px', backgroundColor: '#007bff' }}>
      {text}
    </button>
  );
};

But if you have an AGENTS.md file that specifies your rules, like:

- Use Tailwind CSS for styling (never inline styles)
- Follow PascalCase for component names
- Use lowercase for filenames
- Use TypeScript with proper type definitions
- Use app's color scheme for styling default

The LLM will produce something much better:

// (button.tsx) With config - much better result
interface ButtonProps extends React.ButtonHTMLAttributes<HTMLButtonElement> {
  children: React.ReactNode;
}

export const Button: React.FC<ButtonProps> = ({ children, ...props }) => {
  return (
    <button 
      className="px-4 py-2 bg-blue-600 hover:bg-blue-700 text-white rounded-lg"
      {...props}
    >
      {children}
    </button>
  );
};

This is not the greatest example, but you see the difference, right? It is significant. Proper TypeScript types, Tailwind classes, and adherence to your naming conventions. This is how behavioral configuration works in practice, and trust me, it makes a huge difference in your workflow and can save you a lot of headaches upfront.

My rule of thumb when configuring rules is to start with a minimal config file instead of creating a comprehensive file that you don't understand. Every time the LLM makes a decision you don't like, stop right there and edit the config file to include that rule so it doesn't repeat it next time.

This way, in no time you will have a good set of rules tailored to your codebase. You can store that in git and use it in other projects with a few changes here and there. But again, if you want the above method to work, you must know what good project and code defaults are. If you configure an LLM with mediocre rules, you will still end up with mediocre results.

Steering

This section exists because rules alone cannot give you great results. If config were a one-time thing like "set good rules and forget," we would not have this problem in the first place. The core issue is that LLMs are non-deterministic and can still deviate from their intended behavior even with rules in place.

This is where steering comes in, and don't worry, steering is not a technical term. In this context, I'm just using it as a general explanation. When an LLM starts deviating from what you expect, a human needs to step in and steer it toward the right behavior.

It's similar to driving a Tesla where you let Autopilot take over the wheel, but due to changing traffic and road conditions, it might still make a wrong decision unless you take the wheel back. The system exists to assist you, but ultimately, the responsibility still lies with the person driving.

The same concept applies when working with LLMs. Just like road and traffic conditions change constantly, an LLM operates in a non-deterministic environment. Steering means knowing when to pause it, correct assumptions, limit scope, or ask it to explain its reasoning before moving forward.

By doing this, you can almost always be certain of deriving better results from an LLM. This is how both steering and configuration together make working with LLMs much better.

Review, Refine, Ship.

Welcome to the final section of this post. Honestly, I’d be impressed if you read this far, because I personally wouldn’t, xD. Anyway, as the heading suggests, this is about the final checks you need to perform before you commit to an LLM’s result. No matter the rules and manual steering, in the end, when you look at what it has given you, it can still feel overwhelming and it might not be up to your standards.

In my usual workflow, this is where I break out of the agent loop. This part is completely OFF LLM. Now it is just me and the code the LLM has given me. And TBH, this is my favorite part of the entire session, because this is where I get to review and judge the LLM for what it has done.

At this stage, the LLM has made something I wanted work. Now I need to check if it has done it correctly. A wise man steps in again and says: Just because it works doesn’t mean it is right! And it is true, I agree with him, because again, I’m that wise man.

Unless you’re a bad engineer who accepts whatever an LLM throws at you, most of the time these results won’t look good enough. Because they usually aren’t. So this is where I review and refine things.

I first check if it has modified the relevant files. I check if it has created unnecessary files just for the sake of SOC (separation of concerns), even if they don’t really separate the concerns. I delete or move them to where they belong. I toggle my diff view and check side by side what it has written, how well it handled errors, messages, conditionals, etc. The general good code stuff. This cycle continues until I feel satisfied enough.

When I hit that stage, I test and commit the results. At this point, I have a fully working feature that adheres to my code standards, without putting in the writing effort myself.

When to not use an LLM

This is a short section to point out that LLMs are costly. They cost money to run, build, and use. You will make your spending much higher if you use them for simple tasks that don’t actually need an LLM, such as removing logs, minor style changes, one-word refactors, or adding and removing comments. You should almost always use your IDE for these things.

By doing this, you ensure you outsource the major writing part to the LLM, and keep fixing and refining to yourself, which completes the loop and keeps you satisfied with the end result.

Well, that’s all for this post. I hope you enjoyed reading it. See you in the next one. Have a great day!