DEV Community: Piyush Khandelwal

Understanding Prompt Engineering and Its Different Types

Piyush Khandelwal — Sun, 13 Apr 2025 07:47:38 +0000

Types of Prompt

Alpaca prompt
INST Format
ChatML
Few Shot
Zero-Shot
Chain-of-Thought (CoT)
Self-Consistency Prompting
Instruction Prompting
Direct Answer Prompting
Persona-based Prompting
textual Prompting
multimodal Prompting

Alpaca prompt

The term "Alpaca prompt" comes from a project by Stanford where they fine-tuned Meta’s LLaMA model to follow instructions more like how OpenAI’s models behave. What’s interesting is how they structured their prompts — and that format became known as the Alpaca prompt.

Basically, when you're writing a prompt for Alpaca, you guide the model by clearly stating what you want it to do. You might include some extra details if needed, and then you let the model respond. It's a bit like saying:

“Hey, can you do this specific task? Here’s a bit more info if it helps…”

For example, if you wanted the model to translate a sentence, you’d tell it something like:

What you want: “Translate this English sentence into French.”

What to work with: “Where is the nearest restaurant?”

Then, the model would come back with: “Où est le restaurant le plus proche ?”

It’s pretty straightforward, but what makes it useful is how this setup helps the model stay focused and give more accurate or useful answers. People who train or fine-tune models often use this style because it’s consistent and easy to adapt for lots of different tasks.

INST FORMAT

When working with instruction-tuned models like Alpaca or Vicuna, there's a particular way to format your input that helps the model understand exactly what you're asking. It’s often referred to as the INST format — short for "Instruction" format.

Behind the scenes, it uses special tokens like <>, , and to structure the conversation between the user and the assistant. But if you’re writing one yourself, the format typically looks like this:

<s>[INST] <<SYS>>
[Your system message here — optional, but useful for setting the tone or behavior of the model.]
<</SYS>>

[Your instruction or question here — this is what you want the model to respond to.] [/INST]
[The model's response goes here]
`

`<s>[INST] <<SYS>>
You are a helpful and friendly assistant.
<</SYS>>

How do I make spaghetti carbonara? [/INST]

Why use INST format?
This format helps the model:

Understand who it is (based on the system prompt)

Know exactly what part is the user's instruction

Stay within structured conversations (helpful for chatbots and fine-tuning)

So, even though it looks a bit technical, it’s really just a way to give the model context and direction in a clean, consistent way.

ChatML

when you're dealing with models like ChatGPT, Claude, or any LLM that supports chat-style interactions, there's often a special format used behind the scenes to structure the conversation. One of the most common ones is called ChatML.

Think of ChatML as a simple markup format for multi-turn conversations — where you can clearly separate what the user says, what the assistant replies with, and even what the system (like the developer or trainer) wants to set as the assistant’s behavior.

Here’s how it typically looks:

You are a helpful assistant.<|im_end|>
<|im_start|>user
What’s the weather like today?<|im_end|>
<|im_start|>assistant
I'm not connected to the internet, so I can't check real-time weather, but I can help you write a Python script that does it!<|im_end|>

A Quick Breakdown

<|im_start|>system: This sets the tone or personality of the assistant. It’s usually used once at the beginning.
<|im_start|>user: This is where the user’s message goes.
<|im_start|>assistant: This is the model's response area.

Each block ends with <|im_end|>, which acts like a closing tag.

Why does this matter?

If you’re fine-tuning a model, simulating a conversation, or just building tools around LLMs, ChatML helps keep everything structured. It’s much easier for the model to understand context, and it gives you more control over how it behaves.

It’s also the underlying format used in OpenAI’s chat models (like GPT-4) to represent structured messages in a conversation — even if you don’t see it directly when chatting on the website.

Zero Shot

Zero-shot means asking a language model (like GPT, LLaMA, Claude, etc.) to do something without giving it any examples beforehand. You’re basically saying:

Translate this sentence into Spanish: I love learning new languages.

Few Shot

Few-shot is when you ask an AI model to do a task, but you give it a few examples first so it knows the pattern you’re looking for.

Example:

```Informal: Gonna grab some food, brb.

Formal: I will go get something to eat. I'll be right back.

Informal: Can u send me that doc asap?

Formal: Could you please send me that document as soon as possible?

Informal: lol that was crazy

Formal:




## Chain Of Thought (COT)

### Key Concepts:
Chain-of-Thought (CoT): The model doesn’t just answer directly — it reasons step by step.

Function Calling: You guide the model to invoke a custom function to fetch real data or perform an operation.

Accessing External Data: The model can call functions that interact with the internet, such as APIs (e.g., weather, news, or stock prices).

Chain-of-Thought (CoT) is a prompting technique where, instead of asking the model to just give you the answer, you encourage it to “think out loud.”

You're basically telling the model:

> “Don’t just tell me the final result — walk me through the steps you’d take to figure it out.”

CoT is especially powerful for complex reasoning, math, logic problems, and multi-step tasks. By encouraging the model to reason step-by-step, it’s more likely to arrive at the correct or more thoughtful answer.

How to Trigger CoT?
Just say things like:

“Let’s think step by step.”

“Explain your reasoning.”

“Break it down before giving the answer.”

This kind of nudge often leads to better and more accurate results.

Bonus: CoT + Few-Shot = Superpowers
Some people even give few-shot CoT examples first, showing the model how to reason through multiple problems before giving it a new one. That combo can be really powerful for tricky questions.

Exmaple:


```Q: John is taller than Mike. Mike is taller than Sarah. Who is the tallest?
A: John is taller than Mike, and Mike is taller than Sarah. So John is the tallest. Final answer: John.

Q: Alice is older than Bob. Bob is older than Carol. Who is the youngest?
A: Alice > Bob > Carol, so Carol is the youngest. Final answer: Carol.

Q: Tom runs faster than Jerry. Jerry runs faster than Max. Who is the slowest?
A:

Self-Consistency Prompting

Self-Consistency Prompting is a cool technique used to improve the reliability and accuracy of language models by encouraging them to internally verify their answers before providing a final response. It’s kind of like getting multiple opinions and averaging them to get a more trustworthy result.

In simple terms, Self-Consistency means that the model doesn't just answer a question once and call it a day. Instead, it answers multiple times independently, using different reasoning or processes, and then chooses the most consistent answer across all its attempts.

How Does It Work?

Multiple Answers: The model generates multiple answers to the same question (or a few alternative solutions).

Consistency Check: It checks which answer appears most frequently or aligns best with its reasoning.

Final Answer: The model picks the most consistent, logically sound answer based on the multiple attempts.

This approach helps reduce errors that may arise from an outlier or isolated bad reasoning process.

Example:

What’s the result of adding 573 and 876?

 // Without Self-Consistency:
 // Model's answer:
573 + 876 = 1450.

The model just does the calculation once and gives an answer.

```Prompt: "Answer the following question in 3 different ways, and then tell me which one is most consistent: What’s the result of adding 573 and 876?"

// Model's responses:

573 + 876 = 1449

573 + 876 = 1451

573 + 876 = 1450

// The model now has 3 responses. It checks which one appears the most, and in this case, 1450 appears twice.

// Final Answer:
The model concludes that 1450 is the most consistent and reliable answer.




### Real-World Applications of Self-Consistency:
- Mathematical Reasoning: For complex calculations, where a model might make an arithmetic mistake, Self-Consistency can prevent errors.

- Decision Making: For questions involving subjective reasoning (e.g., determining the best option out of a set), it can reduce the impact of bias or wrong reasoning.

- Scientific or Technical Queries: When looking for the best conclusion based on multiple lines of reasoning, this technique ensures that the most logically consistent answer is chosen.

Example: Self-Consistency in Complex Reasoning



```"If I have a glass of water, and I pour half of it into another glass, then drink 1/4 of what's left, how much water is in the original glass?"

The model gives multiple answers:

1/2 glass of water left.

3/4 glass of water left.

1/4 glass of water left.

Then, the model chooses the most consistent one (in this case, 1/2 glass), since pouring half and drinking a quarter of the remaining would logically leave half of the original water.

Instruction Prompting

Instruction Prompting is a technique where the language model is given clear and direct instructions on how to approach a specific task. The goal is to guide the model to perform a task by explicitly telling it what to do, how to do it, and what the output should look like.

This approach makes it easier for the model to understand the user's expectations and respond in a more structured, predictable way. It’s highly effective for achieving consistent and accurate results in a wide range of use cases.

Example:

Prompt:
"Extract the following details from the text: Name, Age, and City of residence."

Text to extract data from:
"John Doe, aged 28, lives in New York. He works as a software developer and enjoys outdoor activities."

Response:

Name: John Doe

Age: 28

City: New York

Real-World Use Cases for Instruction Prompting:

Customer Support Automation: You can give instructions like, "Provide a helpful response to the customer's complaint about their late delivery," ensuring a consistent tone and structure in replies.
Content Generation: For generating articles, blog posts, or marketing copy, you can provide instructions like, "Write a 500-word blog post on the benefits of electric cars, focusing on environmental impact and cost savings."
Code Writing: You can instruct the model to, "Write a Python function that accepts two arguments, adds them together, and returns the result."
Data Processing: Extract or clean data by instructing the model to "Extract all dates in YYYY-MM-DD format from the following text."

Direct Answer Prompting

Direct Answer Prompting is a technique where the prompt is framed in a way that explicitly asks for a straightforward answer without requiring the model to reason or provide additional context. The goal is to make the model give you a concise, factual response, typically without elaboration.

This type of prompting is great when you need clear and unambiguous information quickly. It’s also useful when you want to avoid unnecessary explanations or additional details in the model’s response.

Example:

"What is the capital of France?"

Response:
"Paris"

This is a simple, direct question, and the model responds directly without further context or elaboration.

Persona-based Prompting

Persona-based Prompting is a technique where the model is guided to adopt a specific persona or character to shape its responses. The idea is to assign a distinct personality or set of traits to the model, ensuring that the responses are aligned with a particular tone, style, or behavior.

This method is especially useful in scenarios where you want the model to mimic certain types of interactions, such as emulating a friendly assistant, a professional advisor, a casual conversational partner, or even a specific character from a story.

Example:

You are a financial advisor with 10 years of experience. Advise the user on the best way to save for retirement.

Real-World Use Cases for Persona-based Prompting:

Customer Support Chatbots: By adopting a friendly, patient, or professional persona, chatbots can provide tailored support based on the user's needs and mood.
Personalized Learning: In educational settings, a persona can help the model explain complex concepts in a way that resonates with the learner, whether that’s through a more casual, friendly tone or a structured, academic style.
Entertainment and Storytelling: Persona-based prompting can be used to create interactive characters or narrators who respond in a specific style, such as a fairy tale character, a wise old sage, or a humorous sidekick.
Virtual Personal Assistants: Voice assistants can take on different personas to make interactions more natural, whether they’re being formal for work-related tasks or friendly for casual inquiries.

In this we can pass person name his vocab tone and style, and then model will behave like that.
Example: "Amitabh Bachchan Style With His Speaking and writting data."

Textual Prompting

Textual Prompting is a method used to guide a language model in generating specific responses based on well-defined inputs. By crafting clear and focused prompts, you can control the output, ensuring it meets your needs. Whether you’re asking a question, requesting creative work, or providing instructions, textual prompts allow you to shape the model’s behavior and achieve your desired results.

This approach is versatile and can be applied across various domains, including customer service, content creation, research, and even entertainment.

"What are the benefits of exercising regularly?"

Response:
"Exercising regularly can improve cardiovascular health, boost mental well-being, enhance muscle strength, and increase overall energy levels. It can also help with weight management and reduce the risk of chronic diseases."```



## Multimodal Prompting
Multimodal Prompting is a technique where you provide input to a model that includes multiple types of data or media, such as text, images, audio, or video. The model then processes these different input modalities together to generate responses or perform tasks that involve understanding and integrating the various forms of input.

This approach allows for richer, more interactive experiences, as it taps into the model's ability to understand and interpret diverse forms of information simultaneously. For example, a multimodal model might take in both a text description and an image to provide more accurate and context-aware outputs.

Example:


```// Text + Image
Prompt:
"Here is a picture of a dog playing in the park. What breed is it?"
(with an image of a dog playing in the park)

Response:
"Based on the image, the dog appears to be a Golden Retriever. It has a fluffy coat, friendly expression, and the characteristic coloring of this breed."

In this example, the model uses both the text input and the image to generate a response that answers the user’s question.```

How GPT Works Behind The Scene

Piyush Khandelwal — Wed, 09 Apr 2025 15:17:38 +0000

Transformers: The Cool Trick Behind Chatty AI

Hey, ever wonder how AI—like the one you’re talking to right now—seems to get you? How it can chat, translate, or even whip up a story without missing a beat? Well, let me spill the beans: it’s all thanks to something called a Transformer. Think of it as AI’s secret weapon for tackling language. I’m not gonna drown you in techy mumbo-jumbo—let’s just break it down like we’re hanging out, maybe sipping some chai (or coffee, no judgment). By the time we’re done, you’ll see why Transformers are such a big deal. Ready? Let’s dive in!

So, What’s a Transformer?

Picture this: you’ve got a buddy who’s a wizard with words. They can translate your ramblings into French, finish your half-baked sentences, or even write you a poem about cats. That’s basically what a Transformer is—an AI model that’s crazy good at language stuff. It’s got two main players: the encoder (the part that “gets” what you’re saying) and the decoder (the part that spits out a response). Together, they’re like a tag team, crunching words with some fancy math to make magic happen.

Please visit to see how graphically this works
https://poloclub.github.io/transformer-explainer/

The Bits That Make It Tick

1. Encoder: The Listener

The encoder’s like that friend who actually hears you. You say, “I’m craving chai,” and it doesn’t just nod—it digs into the whole sentence, figuring out how “craving” and “chai” vibe together. It turns your words into a secret code (fancy term: vectors) that the AI can play with. And here’s the kicker: there’s usually a stack of encoders, each one sharpening the picture a little more.

Real Talk: It’s like when you’re telling a story and someone picks up on the juicy details—not just the words, but the point.

Open AI Token Encoder:- https://tiktokenizer.vercel.app/

2. Decoder: The Talker

Once the encoder’s got the gist, the decoder jumps in to reply. Say you’re translating “I love chai” to Spanish—it’s the decoder that goes, “Okay, here’s ‘Amo el chai’ for you.” It builds the answer one word at a time, like a pro storyteller spinning a yarn.

Fun Bit: Think of it as the friend who takes your idea and runs with it, turning it into something new.

3. Embeddings: Word DNA

Words mean nothing to a machine unless you give them a number vibe. Embeddings do that—turning “chai” into a string of numbers that say, “Hey, I’m a cozy drink!” Words like “tea” might get similar numbers, while “rocket” is way out in left field.

Why It’s Cool: It’s how the AI knows “chai” and “tea” are buddies, but “chai” and “truck” aren’t.

See Graphically:- https://projector.tensorflow.org/

4. Positional Encoding: Keeping It Straight

Here’s a weird twist: Transformers don’t read left-to-right like we do—they see all the words at once. But order matters, right? “I love chai” isn’t “Chai loves me.” So, they slap on positional encoding—little tags that say, “Yo, I’m word #1,” or “I’m word #3.” Keeps things from getting scrambled.

Quick Take: It’s like numbering your grocery list so “milk” doesn’t swap places with “cereal.”

5. Self-Attention: The Smart Connector

This is the real MVP. Self-attention lets the AI zoom in on what matters. In “The chai, which is spicy, tastes great,” it links “chai” to “spicy” and “tastes,” so it knows what’s what. It’s like when you’re chatting and suddenly remember a detail from earlier that ties it all together.

Example: Ever had a convo where someone goes, “Oh yeah, that reminds me…”? That’s self-attention in action.

6. Multi-Head Attention: Extra Brainpower

Take self-attention, then give it a power-up. Multi-head attention is like having a crew of pals all eyeballing your sentence from different angles. One’s checking how “chai” ties to “spicy,” another’s linking it to “tastes.” They team up for the full scoop.

Why It Rocks: It’s how the AI catches all the little connections we humans take for granted.

7. Softmax: Picking the Winner

When it’s time to choose the next word, the AI doesn’t just wing it. Softmax gives it a probability vibe—like, “After ‘I love,’ there’s a 70% chance it’s ‘chai,’ 20% ‘pizza,’ 10% ‘chaos.’” Then it picks.

Fun Example: It’s like betting on your friend’s next word in a game of finish-the-sentence.

8. Tokenization: Chop Chop

Before anything happens, the AI chops your sentence into bite-sized bits—tokens. Could be words like “chai” or even “!”—each gets its own ID. It’s like turning your sentence into Lego pieces for the AI to build with.

Quick Peek: “I love chai!” becomes [“I”, “love”, “chai”, “!”]. Simple, but key.

The Creativity Dial: Temperature

Okay, let’s talk something fun—temperature. It’s like the AI’s mood setting. Low temperature? It plays it safe, sticking to obvious answers. High temperature? It gets wild, maybe too wild. Here’s how it shakes out:

Low (0.1): “I love chai.” Predictable, solid.
High (1.5): “I love chai-flavored moonbeams.” Uh, what?
Middle (0.7): “I love chai and cozy vibes.” Just right.

Think of it like cooking: low temp is following the recipe; high temp is tossing in whatever’s in the fridge.

How It All Comes Together

Let’s run through it real quick, like a movie montage:

You say: “I want chai.”
AI chops it: [“I”, “want”, “chai”].
Adds order: 1, 2, 3.
Turns it to numbers: Vectors, baby!
Encoder listens: “Oh, they want chai.”
Decoder talks: Starts building a reply.
Picks words: Softmax and temperature team up.
You get: “Chai sounds perfect!”

Final Vibes

So, there you go—Transformers in a nutshell! They’re the reason AI can chat, translate, or dream up wild stories. No robotic lectures here—just the good stuff, explained like we’re buddies. If you’re still curious (or confused), hit me up—I’m always up for round two! 😊

Resources for understanding more:-

We'll meet soon, with another topic.