Most engineers I know use Midjourney the same way they use ChatGPT: write a sentence describing the thing, hit enter, accept the output.
This is why every README banner, every blog hero image, every product mockup AI-generated by a dev looks like it was AI-generated by a dev. Soft lighting. Vague subject. Cinematic-ish but not really cinematic. The five-finger giveaway has been mostly fixed; the prompt-illiteracy giveaway hasn't.
Midjourney is not a language model wearing an image-generation hat. It's a camera with a strong stylistic prior. Prompting it like ChatGPT — natural-language paragraph, no parameters, no specific references — is like buying a DSLR and only ever using auto mode. It works. It also wastes about 80% of what the tool can actually do.
This post is for engineers who use Midjourney occasionally for work artifacts (README banners, marketing assets, mockups, blog illustrations) and want to stop generating slop without becoming a full-time prompt engineer. Six components. Each one has a default that's worth changing.
1. Lens — replaces the entire "composition" decision
If your prompt doesn't specify a lens, Midjourney picks one. The default it picks is roughly equivalent to a 50mm prime — neutral, no flattering distortion either way, mid-distance from subject. Fine for nothing in particular.
Lenses are the single highest-leverage component because they encode an entire bundle of decisions: distance from subject, depth of field, perspective distortion, what gets compressed and what gets stretched. A few that change the output dramatically:
-
shot on 24mm lens— wide, slight distortion at edges, dramatic perspective. For environments, architecture, anything where you want to feel the space. Makes interiors look bigger than they are. -
shot on 85mm lens— classic portrait compression. Background blur is creamy, subject pops, distance feels collapsed. For headshots, product hero shots, anything where the subject is the entire point. -
shot on 35mm lens— documentary feel. Slight wide, less compression, looks like a journalist took it. For lifestyle, candid, anything that should feel real rather than staged. -
shot on 200mm telephoto— extreme compression, background almost flattened against subject. For dramatic isolation effects. -
macro lens— extreme close, shallow depth, the kind of detail you can't see with the naked eye.
Pick one before you write anything else.
2. Lighting — the second-highest-leverage component
Lighting is the difference between "AI image" and "image." Default Midjourney lighting is roughly soft daylight, slightly diffuse, no strong directional source. It looks fine and looks generic.
Lighting terms that make a real difference:
-
golden hour/blue hour— time-of-day lighting with strong color cast. Golden = warm, low-angle, long shadows. Blue = cool, low light, slightly melancholy. -
harsh midday sun— high contrast, hard shadows. Looks like a documentary photo, not a stock photo. -
Rembrandt lighting— single light source at 45°, classic triangular cheek highlight. Portrait staple. -
chiaroscuro— extreme dark/light contrast, most of the frame in shadow. Dramatic, painterly. -
softbox studio lighting— even, diffuse, no shadows. The flatness of catalog photography. -
practical lighting only— only light visible in the scene (lamps, windows, neon signs). Tells the model not to add invisible fill light. -
backlit/silhouette— light behind the subject. Mood-heavy, cuts detail, looks intentional.
The practical lighting only clause is a specific weapon. Default Midjourney adds invisible fill light to almost every scene because most stock photography does. Removing it pulls the image one step closer to looking like a real photograph.
3. Film stock or sensor — the color/grain identity
Nobody's eye thinks about this consciously. Everyone's eye reads it instantly.
Film stocks are training-data shortcuts to entire visual identities. They reliably steer Midjourney toward consistent color palettes, contrast curves, and grain. A few that produce dramatically different outputs:
-
Kodak Portra 400— warm, slightly desaturated, flattering on skin tones, neutral on greens. The Instagram-of-2018 look. -
Fuji Velvia 50— extremely saturated, especially in greens and reds. Landscape photography staple. -
Cinestill 800T— tungsten-balanced, halation around bright lights (the red glow). Cyberpunk staple. -
Ilford HP5— black and white, grainy, documentary feel. -
Polaroid 600— washed out, slightly soft, square-ish. Nostalgia in a single token. -
shot on RED Komodo— modern digital cinema look, clean, high dynamic range. -
shot on iPhone 15— phone camera character, slightly aggressive HDR, computational sharpening.
The digital sensor variants are useful when film stocks feel too retro for your use case. "Shot on RED Komodo" looks contemporary in a way that "Kodak Portra 400" doesn't.
4. Framing — what's visible, what's cropped, where the subject sits
Default framing is centered medium-distance subject, full visible. Variants that make the image look thought-through:
-
extreme close-up,close-up,medium shot,wide shot,establishing shot— pick consciously, don't let the model default. -
subject in lower third,subject offset left,negative space upper right— explicit composition rules. -
over-the-shoulder shot— implies a second figure even if invisible. Adds story. -
low angle/high angle/Dutch angle— camera position relative to subject. Low angle = subject feels powerful. High angle = subject feels small. Dutch = unease. -
cropped at the chest,cropped at the waist— explicit framing for portraits.
The specific phrasing matters. "Wide shot" and "establishing shot" produce different framings even though they sound interchangeable.
5. Palette — the discipline most prompts skip
Most prompts mention zero colors and let Midjourney pick. The result is the visual equivalent of "medium" — nothing wrong, nothing memorable.
Three levels of palette control, each more aggressive than the last:
-
Mood-level:
muted earth tones,cool palette,monochromatic blue,desaturated,vibrant primary colors. These bias the model without forcing it. -
Specific palette:
palette of cream, terracotta, and sage green. Three colors, named. The model will respect this surprisingly well. -
Color-grade reference:
color graded like Blade Runner 2049,color palette of Wes Anderson films. Hard to abuse, very effective when applied to scenes that match the reference's energy.
For consistent brand assets — multiple images that need to feel like a set — the specific palette approach is the only one that holds across generations.
6. Aspect ratio — set it, don't accept the default
Midjourney defaults to 1:1 (square). For most actual use cases, this is wrong:
-
--ar 16:9— blog hero images, README banners, video thumbnails. -
--ar 3:2— photographic standard. Looks more "photo" than 16:9. -
--ar 4:5— Instagram portrait, vertical content. -
--ar 9:16— TikTok/Reels, mobile-first hero images. -
--ar 21:9— ultra-wide, cinematic, dramatic.
The aspect ratio doesn't just crop the image — it changes what Midjourney generates. A 16:9 prompt produces a fundamentally different composition than the same prompt at 1:1, because the model is composing for the canvas it's been told it has.
Setting the aspect ratio first means you stop generating square images and cropping them to fit. The composition is correct from the start.
What this looks like assembled
Default prompt:
beautiful woman in a forest
This is the prompt that people give Midjourney and then complain it looks generic. It is, by design, generic. Six unspecified components, six defaults.
The same subject, every component specified:
portrait of a woman walking through a misty pine forest,
shot on 85mm lens, golden hour backlighting through trees,
Kodak Portra 400, medium shot cropped at the waist,
muted greens and warm skin tones, --ar 3:2
Not a longer prompt for the sake of being longer. A prompt where every word is doing a specific job. The output is repeatable, the output is consistent across generations, and the output looks like something a working photographer might have shot.
If you want this prompt to look different — say, you want a cyberpunk city version of the same subject — you swap the components, not the sentence:
portrait of a woman walking through a neon-lit alley,
shot on 35mm lens, practical lighting only, Cinestill 800T,
low angle medium shot, palette of teal and orange,
--ar 3:2
Same subject grammar. Different lens, lighting, film, framing, palette. Completely different image. This is the move that makes a prompt library compose instead of accumulating.
The prompt-engineering parallel
If you've spent time tuning LLM prompts, this should look familiar. The progression is the same:
- Beginner: writes the request as a sentence, accepts whatever the model produces.
- Intermediate: discovers a few magic phrases that improve outputs, tacks them on inconsistently.
- Advanced: identifies the components of a good prompt, treats each as a parameter, varies them deliberately.
Midjourney prompting follows this curve. Most people stay at stage 1 because the stage-1 outputs are passable and the stage-3 jump requires thinking like a cinematographer for an hour. The leverage is at stage 3.
Going further
If you'd rather not rebuild a Midjourney prompt library from scratch, the Midjourney Prompt Encyclopedia is 400 prompts already structured this way — each one has lens, lighting, film stock, framing, palette, and aspect ratio baked in, organized by use case (portraits, products, environments, brand mockups, blog headers). $19 lifetime.
But the structure above is the actual asset. Take any one of your existing flat prompts, layer the six components onto it, and regenerate. The improvement is bigger than you'd guess from how mechanical the change is.
Top comments (0)