Biricik Biricik

Posted on Apr 11 • Edited on May 16 • Originally published at zsky.ai

The 5-Element Prompt Formula Every Photographer Knows

#ai #prompt #photography #tutorial

Every week I see another "ultimate AI prompt guide" post. They are usually a wall of magic words. "Masterpiece, 8k, ultra-detailed, award-winning, trending on ArtStation, octane render, unreal engine 5, sharp focus."

Half of those tokens do nothing useful on modern models. The rest are workarounds for a problem that photographers solved in the 1930s: how do you describe an image before it exists, so a technical system can produce it?

The answer has always been five elements:

Subject
Composition
Light
Style
Mood

That is it. Once I started writing prompts the same way I wrote shot lists for photo shoots, everything got easier. Fewer tokens, more consistent output, no magic words. I want to walk you through how this maps to AI image generation, with real side-by-side examples.

Context on where I am coming from: I am a photographer who built ZSky AI, a creativity platform for people who want to generate images and video without a learning curve. I have aphantasia — I cannot picture images in my head — so I depend entirely on language to describe what I am trying to make. If language fails, I get nothing. That constraint made me fanatical about prompt formulas that actually work.

Why the wall-of-words approach fails

Here is a prompt I pulled from a popular Reddit thread last week:

masterpiece, best quality, ultra-detailed, 8k uhd, sharp focus, intricate details, professional photography, award winning, a dog

Count the tokens doing real work. One: "a dog." The rest is noise. The model sees a wall of quality modifiers with no hierarchy. You might get a decent-looking dog, but you had no control over any other decision. What breed, what lighting, what lens, what mood — you left all of it up to the dice.

A photographer's brief for the same shot would look like this:

a golden retriever sitting on wet concrete, low angle three-quarter view, overcast afternoon light, documentary photography, calm and tired

Thirteen meaningful words. Five elements. Zero magic words. And the output is specific enough that you and I could argue about whether the result matched the brief, which means we are actually communicating with the model instead of praying at it.

Element 1 — Subject

The single most common prompting mistake I see is a vague subject. Not "a dog." A golden retriever. Not "a person." A weathered fisherman in his sixties. Not "a city." A rain-wet alleyway in Osaka.

Specificity collapses the probability space. The model has fewer decisions to make about who or what you are showing, which means more of its attention goes into the elements you actually care about. This is why single-word subjects almost always underperform multi-word subjects on the same model.

A good subject has three parts:

Who or what it is — "a golden retriever"
A distinguishing trait — "old, with a greying muzzle"
A pose or action — "lying on a wooden dock"

Combine them:

an old golden retriever with a greying muzzle, lying on a wooden dock

That is a photograph waiting to happen. A model can see it. You can see it. We are in agreement about the subject.

Element 2 — Composition

This is the element most AI prompt guides skip entirely, and it is the one that changes the result the most. Composition is where the camera is, how much of the subject fills the frame, and how the eye moves across the image.

The photography vocabulary here is already standardized and the models know it:

Angle: low angle, high angle, eye level, Dutch angle, overhead, worms-eye view, birds-eye view
Framing: extreme close-up, close-up, medium shot, medium-wide, wide, extreme wide, full body, head and shoulders, portrait, environmental portrait
Depth: shallow depth of field, deep focus, bokeh background, foreground element, layered
Rule: rule of thirds, centered, leading lines, symmetry, negative space

Pick one from each category only when it matters. Four composition tokens are usually plenty:

low angle, medium shot, shallow depth of field, leading lines

Now compare two prompts with identical subjects:

Prompt A: a golden retriever on a dock

Prompt B: a golden retriever on a wooden dock, low angle, medium shot, shallow depth of field, the dock leading off toward the horizon

Prompt A gives you a dog. Prompt B gives you a photograph. Composition is the difference between "AI slop" and something you would hang on a wall.

Element 3 — Light

In photography, light is the thing. A great subject under bad light is a bad photo. A boring subject under extraordinary light is a portfolio piece. The same is true for generation models — they are trained mostly on photographs and paintings, which means they have an enormous vocabulary for light.

The tokens to keep in your pocket:

Time of day: dawn, golden hour, midday, blue hour, dusk, night
Quality: soft, hard, diffused, directional, backlit, rim lit, top lit
Source: window light, studio strobe, candlelight, neon, street lamps, overcast sky, direct sunlight
Color temperature: warm, cool, tungsten, mixed, monochromatic

You rarely need more than two light tokens. "Golden hour, backlit" is sometimes enough to carry an entire image.

Side by side:

Prompt A: a weathered fisherman in his sixties, medium shot

Prompt B: a weathered fisherman in his sixties, medium shot, golden hour, rim light from behind, warm tones

Same subject, same framing. Totally different photograph. The second one is lit. The first one is flat.

Element 4 — Style

Style in AI prompting is often where people cram every brand name they have ever heard. "In the style of Annie Leibovitz, Greg Rutkowski, Artgerm, Studio Ghibli, Caravaggio, Pixar." That is not a style, that is a focus group.

A style token should answer one question: what medium is this image in? Not who made it. The medium.

Good style tokens:

Documentary photography
Editorial fashion photography
Oil painting
Ink wash illustration
Charcoal sketch
Polaroid
Large format film
Cyanotype
35mm film

If you must reference an artist, reference exactly one, and only when their name genuinely describes a recognizable aesthetic. "In the style of" followed by twelve names is the AI equivalent of asking a graphic designer to make your logo "bold but minimal, playful but corporate, loud but subtle."

Element 5 — Mood

This is where the image stops being a picture and becomes a feeling. Mood tokens are the shortest part of a good prompt — usually one or two words — but they do a huge amount of work.

The trick is to pick words that describe the emotional state of the image, not the emotional state you want the viewer to have. Good mood tokens describe the scene:

Calm, tense, joyful, mournful, tender, uneasy, triumphant, melancholic, hopeful, lonely

Bad mood tokens describe the viewer's reaction:

Beautiful, stunning, amazing, breathtaking, perfect

Models already try to produce "beautiful" output. Telling them to make it beautiful is a no-op. Telling them the scene is "melancholic" is a real instruction.

The full formula in practice

Here is what a complete prompt looks like when you use all five elements, in order:

Subject: a weathered fisherman in his sixties mending a net
Composition: medium shot, eye level, rule of thirds, hands in focus
Light: overcast afternoon, soft diffused light from the left
Style: documentary photography, 35mm film
Mood: patient, meditative

Combine it into a single string:

a weathered fisherman in his sixties mending a net, medium shot at eye level, rule of thirds with hands in focus, overcast afternoon soft diffused light from the left, documentary photography on 35mm film, patient and meditative

That is a brief. Any photographer could go shoot that image. So can an AI model, which is the whole point.

A note on prompt length

Shorter prompts are usually better. A lot of older guides tell you to stuff prompts with 80+ tokens. Modern generation models weight earlier tokens more heavily, and every extra word dilutes the first few. My rule of thumb is one sentence per element, max. If a prompt runs longer than 60 words total, I start deleting.

If you want to see this in action, try the same prompt on the free image generator at ZSky twice — once as a minimal five-element version, once padded with magic words. The minimal version almost always wins.

The five-element checklist

Tape this to your monitor:

Is there a specific subject with a distinguishing trait and an action?
Have I told the model where the camera is and how much of the frame the subject fills?
Have I described the light — time, quality, direction?
Have I named the medium, not a focus group of artists?
Is there one mood word that describes the scene?

If the answer to all five is yes, you have a working prompt. If one is missing, add it. That is the entire discipline.

Why this matters more in 2026

AI image generation is now commoditized. The model you are using today is probably about as good as the model anyone else is using today. The people producing noticeably better images are not using secret tokens — they are writing better briefs.

This is the same story as every other creative tool in history. The cave painters did not have better rocks. The film photographers did not have better silver. The difference was always the brief. AI is not the revolution we thought it was. It is the same old discipline, on a new medium. If you want to go deeper on that, I wrote about it in the ZSky founder philosophy.

The five-element formula is the shortest path I know from "I want to make something" to a prompt that actually produces something. It is not clever. It is not new. It is just borrowed directly from the people who have been describing images to technical systems since 1839.

DEV Community