Why prompt structure matters
Diffusion models are conditioned on text embeddings, and the conditioning signal is much sharper when the prompt is structured. The original SDXL paper from Stability AI (Podell et al., 2023) showed that adding micro-conditioning on image size, crop, and aesthetic score lifted CLIP score and human preference dramatically over SD 1.5. The lesson generalizes: the more structured signal you give the model, the more it can give you back.
OpenAI's DALL-E 3 system card credits its caption-improvement pipeline β rewriting training captions with GPT β for the leap in prompt-following over DALL-E 2 (OpenAI, 2023). On the user side, the inverse holds: better-structured prompts at inference time produce dramatically better images. Structure is the shared language between you and the model.
The seven-slot prompt formula
Memorize seven slots in order. Skip a slot only when you genuinely mean to. Each slot maps to a different conditioning lever inside the model.
| Slot | Example tokens |
|---|---|
| Subject | a weathered fisherman, 60s, deep wrinkles |
| Style | editorial photography, Magnum Photos aesthetic |
| Composition | medium close-up, rule of thirds, shallow DOF |
| Lighting | golden hour rim light, soft fill, overcast key |
| Mood | contemplative, melancholic, weather-worn |
| Technical | 85mm f/1.4, ISO 200, --ar 3:2 --stylize 250 |
| Negative | no text, no watermark, no extra fingers |
Five worked examples
One prompt per major model family. Steal the structure, swap the subject.