"AI Image Prompting 2026 — The 8-Element Formula and How Each Tool Differs"

Subject · Scene · Camera · Lighting · Style — one structure that works across Nano Banana, Midjourney, and GPT Image


ํ•ต์‹ฌ ์š”์•ฝ

  • Audience: You read the tool comparison and picked one — but your prompts keep collapsing into the same generic look or missing the brief.
  • What you'll get: 1) The 8-element prompt formula that works in all three tools, 2) tool-specific differences (Midjourney parameters, Nano Banana's natural language, GPT Image's conversational edits), 3) before/after concrete examples, 4) seven common mistakes, 5) weights and negative prompts.
  • Mental model: A prompt isn't a list of words. It's a director's note to a camera crew — lens, light, blocking, mood — not "make it pretty."

1. The 8-element formula (works in all tools)

Synthesizing 2026 official guides from Black Forest Labs, Anthropic, and Midjourney, they all converge on the same order:

[Subject] + [Scene/Environment] + [Composition/Shot] + [Camera/Lens]
+ [Lighting] + [Style/Medium] + [Color/Mood] + [Quality/Negatives]

1.1 What goes where

# Element Example phrasing
1 Subject "Korean woman in her 30s," "black cat," "lighthouse on a cliff"
2 Scene "rainy Tokyo alley," "sunlit cafe window seat"
3 Composition "close-up," "wide-angle," "rule of thirds," "over-the-shoulder"
4 Camera/Lens "35mm film," "85mm portrait lens," "shallow depth of field"
5 Lighting "golden hour," "rim light," "softbox front," "neon back light"
6 Style/Medium "photorealistic," "watercolor," "oil painting," "Studio Ghibli style"
7 Color/Mood "muted pastel palette," "high contrast," "moody, melancholic"
8 Quality/Negatives "8K, sharp focus" / "no text, no watermark"

1.2 Order matters

Diffusion models put more weight on words near the front (2026 guide). Put your primary subject and key action in the first 10–15 words.

1.3 Full example

"A 30-something Korean woman in a beige trench coat [Subject] walking through a rainy Seoul alley at night [Scene] medium shot, slight low angle [Composition] shot on 50mm prime, shallow depth of field [Camera] soft neon backlight from shop signs [Lighting] photorealistic, cinematic still [Style] muted teal and amber palette, melancholic [Mood] 8K, sharp focus, no watermark [Quality]"

This works as-is in Nano Banana 2, Midjourney, and GPT Image 1.5.


2. Where the tools diverge

2.1 Nano Banana 2 (natural-language friendly)

Plain prose works fine. Text rendering is strong, so you can directly request on-image text.

"A book cover for 'AI for Beginners' — minimalist white background, serif title in black, geometric illustration of a circuit board with leaves growing out of it, soft gradient orange-to-yellow accent, clean modern editorial design"

2.2 Midjourney v7 (parameter-driven precision)

Use parameters to fine-tune style strength, diversity, and consistency (official parameters).

Parameter Effect Recommended values
--s (stylize) Aesthetic strength 100 (default), 50 (faithful to prompt), 750 (heavy stylization)
--c (chaos) Diversity across the four outputs 0–50 normal, 50–100 experimental
--ar (aspect ratio) Aspect ratio 16:9, 2:3, 1:1
--seed Lock the seed Vary one element while keeping the base
--sref Style-reference URL Mimic the style of another image
--oref Character-reference URL Keep a person consistent across prompts

Example:

A medieval castle on a cliff, sunrise, cinematic, fog --ar 16:9 --s 250 --c 30

V7 specials: personalization profiles (--p) apply your trained taste; Draft Mode (--draft) gives 10× faster ideation.

2.3 GPT Image 1.5 (conversational)

Its strength isn't the first generation — it's iterative editing. Use the 8-element formula on the first call, then plain conversation afterwards.

1st: "A young man holding an espresso cup, cafรฉ window seat, morning light, photorealistic, 50mm lens, shallow depth of field" 2nd: "Same image, but change the cup to a glass of orange juice" 3rd: "Now add a dog sleeping under the table"

Each step persists. It tracks the previous image, which gives the highest cross-edit consistency.


3. Before / After

Before (vague)

"Pretty landscape photo"

After 1 (specific)

"A photorealistic landscape of a quiet mountain lake at golden hour, mirror-like water reflection, autumn maple trees at the shore, mist rising from the surface, wide-angle composition, shot on 24mm lens, warm orange and teal palette, sharp focus, 8K"

After 2 (Midjourney parameters added)

A photorealistic landscape of a quiet mountain lake at golden hour, mirror-like water reflection, autumn maple trees at the shore, mist rising from the surface, wide-angle composition, shot on 24mm lens, warm orange and teal palette, sharp focus --ar 21:9 --s 200 --c 20

After 3 (GPT Image conversational)

After the first generation: "Same scene but at twilight with a faint full moon over the mountains."

Same intent, 5–10× quality gap.


4. Seven common mistakes

Mistake Result Fix
1. Stacking adjectives ("amazing, beautiful, stunning") Mostly ignored Replace with concrete description ("misty rim light, gold-tipped autumn leaves")
2. Negatives ("not blurry") Ignored or reversed Use positive form ("sharp focus, fine detail")
3. Too many elements at once Some get dropped Keep 3–5 priorities, push the rest into edits
4. Repeating the same word No effect Use weights: ((emphasis)) or word::2
5. Generic quality tags ("8K, ultra-realistic") Weak signal Describe actual detail ("pores visible on skin, fabric texture")
6. Missing detail anchors for people Hands and eyes break "natural hands, anatomically correct, sharp eyes"
7. Not pinning a seed Can't iterate --seed (Midjourney) or save a generated image to lock in GPT Image

5. Weights and negative prompts

5.1 Weights (Midjourney)

:: followed by a number controls per-token influence.

red sports car::3, urban street::1, neon signs::0.5

→ The car gets 3× weight, the street is baseline, neon is downweighted.

5.2 Negative prompts (Stable Diffusion-family, Midjourney --no)

--no text, watermark, signature, blur, low quality

Midjourney doesn't auto-honor negative phrasing — use the --no parameter. Nano Banana 2 and GPT Image have weaker negative-prompt support; prefer positive phrasing.

5.3 Reference images

Tool How
Nano Banana 2 Attach an image + natural language ("in this style")
Midjourney --sref [URL] for style, --oref [URL] for character consistency
GPT Image Attach an image, then say "in this style"

6. Copy-paste starter templates

Portrait

[Person description] in [Location], [shot type] shot, [Lighting], 
shot on [Lens] with shallow depth of field, photorealistic, 
[Mood] mood, 8K, sharp focus, natural hands and eyes

Landscape

A photorealistic landscape of [Subject] at [Time of day], 
[weather/atmosphere], [composition type] composition, 
shot on [Lens], [color palette] palette, sharp focus, 8K

Illustration / concept art

[Subject] in [Setting], [art style — e.g., Studio Ghibli / Moebius / 
watercolor], [color palette], [lighting], detailed line art, 
[mood], --ar 16:9 --s 400

Product mockup

[Product] on [Surface], studio lighting with softbox front, 
[background — clean white / wooden table], 50mm macro lens, 
shallow depth of field, photorealistic, commercial photography

Book cover / poster

A book cover design for "[Title]" — [layout description], 
[typography — serif/sans, color], [illustration concept], 
[color palette], minimalist editorial design, --ar 2:3

Developer notes

  1. Templatize prompts: Python f-strings or LangChain PromptTemplate with {subject}, {lighting}, etc. — essential when generating 100+ images.
  2. Automate quality scoring: GPT-5 Vision or Claude Vision can score "prompt fidelity." Auto-regenerate anything below threshold.
  3. Save (seed, prompt, model_version): a small DB makes good outputs reproducible.
  4. Nano Banana 2 batch: API supports n=4 per call → choose the best automatically.
  5. Midjourney --sref / --oref automation: not recommended via unofficial bots — ToS and stability concerns. Stick with OpenAI/Gemini for production automation.
  6. IP-safety filter: pre-filter prompts for real-person and brand mentions before submission.

References


This is part 5-2 of 11 in the AI Basics series. Next: AI voice/video — Suno, Runway, and Sora.

๋Œ“๊ธ€

์ด ๋ธ”๋กœ๊ทธ์˜ ์ธ๊ธฐ ๊ฒŒ์‹œ๋ฌผ

Agent Memory Engine (2/10) — Building an AI Agent Memory System with SQLite Alone

"ML Foundations (9/9) — PyTorch vs TensorFlow, and the Road to Local LLMs"

"RAG Core Study (14/26) — Evaluation Sets with RAGAS & DeepEval"

"ML Foundations (8/9) — Deep Learning Architectures: CNN, RNN, Attention"

"ML Foundations (7/9) — Deep Learning Training: Optimizers, Regularization, Initialization"

OpenClaw to Hermes Migration (2/13) — What to Preserve, Partially Port, or Discard

AI Agents I Built (5/7) — Building an Automated Blogger API Publishing System