This is Part 2 of the Prompt Engineering series, which will put prompt engineering tactics into practice, including model choice and LLM parameters like temperature. Check out Part 1 on the importance of creating effective prompts.
Prompt engineering is the practice of enhancing LLM responses by writing optimized prompts and tuning parameters like temperature or topP. There are many strategies and tactics for creating effective prompts, but how do these translate into real-life situations? Which tactics are the most effective? How should I combine prompt writing and parameter tuning? Let’s explore some real examples of prompt engineering put into practice.
Engineering better LLM responses
OpenAI documentation provides six main strategies for prompt engineering:
- Write clear instructions
- Provide reference text
- Split complex tasks into simpler subtasks
- Give the model time to "think"
- Use external tools
- Test changes systematically
These strategies point to the overarching principle of prompt engineering: providing clear, specific, well-structured instructions. There are many ways to split hairs about how exactly to do that, with numerous guides available online (including our previous Guide to Prompt Engineering, where we introduced the basics of creating clear and effective prompts). What we want to do is to dig deeper into how model choice, prompt writing tactics, and parameter tuning interact and interplay to produce the ideal responses.
In the following sections, we will explore each facet in turn and finally put it all together, so you have a concrete idea of what to focus on when implementing prompt engineering in your AI projects.
Model choice
Without any prompt engineering, your choice of LLM serves as the ringfence determining the base quality of the response output. To illustrate, here is the response by Cohere’s command
model compared to Mistral’s mistral-small
model. Both models were asked to summarize an article about NASA’s Europa Clipper mission:
Summarize ${text}
// Assume that ${text} is a dynamic variable that is injected with the
relevant content at runtime
While both models provide a relatively accurate summary of the article, Mistral offers a well-structured response out of the box without additional prompting.
Depending on your use cases and system requirements, Mistral’s responses could be overly-engineered or the more optimal response. Nevertheless, using prompt engineering tactics, you can get command
to respond just like mistral-small
.
Tactics for effective prompts
To provide clear, specific, well-structured instructions, these are some of the tactics employed in the prompts shown below:
-
Break up the instructions for clarity.
Instead of combining the article and the instructions in a single sentence, the prompt is now divided into distinct sections: one for the content and another for the instructions on how to evaluate the content.
-
Demarcate your prompt with special characters or tags where appropriate.
The article content is now enclosed in quotation marks to explicitly mark the text that should be summarized.
-
Include details about the desired output. Quantify where possible.
In one variation, the prompt now asks for the response to be “clear and concise” and even specifies the desired format of using bullet points. In another, it explicitly states the desired length of the output.
Check out how the responses have improved in each of these prompts.
Prompt: Here is the text from an article: "${text}". Provide a summary
of this text in a clear and concise manner.
Response: NASA has launched the Europa Clipper mission, a $5.2 billion
robotic spacecraft, to explore Jupiter's moon, Europa, which is believed
to have a global ocean of liquid water beneath its icy crust and could
potentially support extraterrestrial life. The mission aims to determine
if Europa has the necessary ingredients for life, including water,
organic compounds, and an energy source. The spacecraft, the largest
and most expensive planetary science mission in NASA's history, will
conduct 49 close encounters with Europa, studying its ocean, atmosphere,
and surface features. Scientists are particularly interested in the
possibility of hydrothermal vents on Europa's ocean floor, which could
support simple life forms. The mission also aims to find potential
landing sites for future lander missions. Europa Clipper's findings will
shape future exploration of Europa and the search for life beyond Earth.
Prompt: Here is the text from an article: "${text}". Provide a summary
of this text in a clear and concise manner, starting with the title
**Summary**, and including the following bullet points:
- **Mission:**
- **Objective:**
- **Spacecraft:**
- **Journey:**
- **Scientific Goals:**
- **Future Missions:**
Response: Summary
- **Mission:** NASA's Europa Clipper mission aims to explore Jupiter's
moon, Europa, a promising location for finding extraterrestrial life.
- **Objective:** The primary goal is to determine if Europa has the
necessary ingredients for life, particularly simple, single-celled
organisms.
- **Spacecraft:** Europa Clipper is the largest and most expensive
planetary science spacecraft NASA has ever launched. It carries a
sophisticated suite of nine instruments to study Europa's ocean,
surface, and atmosphere.
- **Journey:** The spacecraft was launched on a SpaceX Falcon Heavy
rocket and will take a 5.5-year journey to Jupiter. It will fly by Mars
and Earth for gravity assists, reaching Jupiter in 2030.
- **Scientific Goals:** Europa Clipper will study Europa's ocean, ice
shell, atmosphere, and potential plumes to understand its habitability.
It will search for water, organic compounds, and energy sources
necessary for life.
- **Future Missions:** A Europa lander is a challenging prospect, but
a compelling discovery could prompt a follow-up mission. Other high-
priority missions include a Uranus orbiter and a mission to Saturn's
moon Enceladus.
Prompt: Here is the text from an article: "${text}". Summarize the text
clearly in 280 characters using this format "Article summary: <your_
summary_here>".
Response: Article summary: NASA's Europa Clipper mission launched on a
SpaceX Falcon Heavy rocket to explore Jupiter's moon Europa, which may
harbor life. The spacecraft will study Europa's ocean, atmosphere, and
surface, seeking signs of habitability and potential life.
Temperature
In some cases, it is also useful to tune the temperature of the responses. This LLM parameter affects how creative or deterministic the output will be — the higher the temperature, the more random and creative the output will be.
While temperature is associated with creative tasks like generating a poem or a story, it can also apply to factual writing, like summaries. Take a look at this response below, where the temperature is set to 0.1. The text output reuses similar words or phrasing, and this repetitiveness makes the summary rather stale and unengaging.
In contrast, a temperature of 1 leads to more dynamic and lively text, but tends to exaggerate or include less important information. This would mean that the summarizer is prone to creating misleading snapshots of the articles.
A more moderate temperature of 0.4 could be the sweet spot for an article summarizer, providing relevant information with livelier language.
Evaluating your LLM responses
We’ve seen individually how model choice, prompt writing tactics, and LLM parameters can affect your desired output. But how do these factors stack together to get optimized results at scale? While prompt writing tactics get you the biggest mileage in obtaining a particular result, LLM parameter tuning and model choice could be the distinguishing factor between an average response and a stellar one.
Using the following prompt with different models and temperature values:
Here is the text from an article: "${text}". Summarize the text clearly
in 280 characters using this format "Article summary: <your_summary_
here>".
We get the following responses:
At a small sample size, it is easy to pick which answer is more desirable. But in a production setting, it is best practice to systematically test changes using a larger sample size to ascertain between randomness and true improvement.
One way to systematically track changes would be to evaluate the LLM responses against a benchmark — using relevant criteria or gold-standard answers. In this case, we want summaries that are concise yet comprehensive — answering the 5W1H (what, who, where, when, why, and how).
These evaluations can be manually done by humans or AI-assisted. Whichever route you choose, ensure that you can evaluate a large sample set with sufficient variety to cover all the edge cases in your use scenarios.
Optimizing LLM responses for AI app development
In practice, an application that leverages generative AI is likely far more complex than using a simple one-shot prompt. Multiple prompts can be used across different tasks in a process, or AI techniques like retrieval-augmented generated (RAG), computer vision, and other data pre-processing tasks may come into play.
An AI orchestration engine like Orkes Conductor can streamline such developmental efforts with its built-in AI tasks and feature-rich AI prompt builder, where you can test various prompt engineering strategies. With one-step integrations for a dozen LLM providers (including OpenAI, HuggingFace, and more), Orkes’ AI prompt builder unlocks the convenience of plugging, testing, and playing with prompts across models.
Besides testing, these prompts can be templatized, saved, and safely used in a production setting for AI-driven workflows and applications, such as document classification, RAG-based search retrieval, approval journeys, and more.
Wrap up
By implementing various prompt engineering tactics, we have explored effective ways to combine model choice, prompt writing, and parameter tuning and systematically test these changes.
Beyond what we have explored here, there are many other ways to improve model outputs, such as model fine-tuning or retrieval-augmented generation (RAG). A strategic combination of these different techniques and methods will unleash the full potential of generative AI in automating creative, complex, or human-involved tasks.
Using Orkes Conductor, you can rapidly build, optimize, and test AI-enabled applications in a distributed environment. Conductor is an open-source orchestration platform used widely in many mission-critical applications for orchestrating microservices, AI services, event handling, and more. Try out a fully-managed and hosted Conductor service with the 14-day free trial for Orkes Cloud.
Top comments (2)
A great blog! Loved the minute details!! Also folks, if you wanna know more around prompt experimenting you can explore this dev tool I came across - getmaxim.ai/
Hey folks, came across this post and thought it might be helpful for you! Check out this comprehensive guide to RAG in AI - Rag In AI.
Some comments have been hidden by the post's author - find out more