The five-element prompt: a checklist for prompts that don't drift
Most production prompts fail for one reason: missing structure. Here's the five-element checklist that turns a sloppy draft into one that ships.
The gap between a prompt that works on three examples and a prompt that works in production is wider than it looks. Most of that gap comes down to structure. A prompt that names exactly what it wants and bounds what it doesn't will outperform one that hopes for the best — by a lot, across every modern LLM.
Here's the five-element checklist we run through whenever a prompt is going into anything bigger than a one-off chat.
1. Role
Tell the model who it's being. Not metaphorically — explicitly.
You are a senior backend engineer reviewing pull requests for race conditions.
That's better than "please review this code" because it primes the model into a specific knowledge subset. The same model, asked to review code without a role, gives generic feedback. With a role, it focuses on the exact failure mode you named.
The role should be narrow. "You are an expert in X" is fine. "You are an expert in everything" is not — broad roles flatten the response.
2. Context
What does the model need to know to do this well? Not everything you know — the relevant subset.
Context:
- This is a Node.js service using TypeORM against a Postgres 16 database.
- Concurrency is high — 200-500 requests per second per pod.
- We've previously had bugs where uniqueness constraints fired under race conditions because the read-then-write pattern doesn't use a transaction.
Context matters more for code review and analysis than for content generation. For a creative task, "this is for an internal team newsletter" is enough. For a debugging task, you want the constraints and the prior failure modes.
3. Task
A clear directive. Not "help me with this" — a specific verb.
Task: review the diff below for potential race conditions or data integrity issues.
The verb is doing work. "Review" is different from "summarise" is different from "fix." Pick one and commit to it. If you want multiple things, ask for them in sequence — one task per call gives crisper output than a kitchen-sink request.
4. Constraints
Things the model must avoid, must include, or must obey. This is where most prompts leak quality.
Constraints:
- Only flag issues you can defend with a specific scenario; no speculative concerns.
- Prefer transactional fixes over locking; we're trying to reduce lock contention.
- Do not propose schema changes — the schema is out of scope for this review.
Constraints prevent drift. Without them, the model fills its response with hedges, caveats, and tangents. With them, the model has guardrails — and the output stays inside the lane you wanted.
5. Output format
Be explicit. Markdown sections, JSON schema, plain text only, list of N items, max word count — whatever you want, name it.
Output:
- A markdown list of issues found.
- Each issue: severity (CRITICAL/HIGH/MEDIUM), one-sentence summary, code snippet illustrating the race, recommended fix.
- If no issues found, say "No race conditions detected" and stop.
For prompts that produce structured data (anything that feeds another program), output format is the single most important element. JSON schema, regex pattern, explicit field list — name it precisely. Modern models like Claude Sonnet 4.6 and GPT-4o follow output format constraints reliably when they're stated up front.
Why this works
The five-element pattern works for the same reason a good function signature works: clarity of contract. The model knows who it's being, what it's working with, what it's doing, what it can't do, and what shape the answer takes. Every degree of freedom you remove from the input is a degree of variance you remove from the output.
This isn't theoretical. The same prompt with these five elements named explicitly, versus the same prompt without, produces measurably different outputs across every modern LLM — Claude, GPT-4, Gemini, Llama 3, DeepSeek. The lift compounds when the prompt gets reused thousands of times (as it does in any product feature).
When not to bother
Conversational chat doesn't need this. If you're asking the model "what's a good restaurant near me" or "explain transformer architecture," structuring the prompt with role/context/task/constraints/output is overkill — the cost outweighs the benefit.
The pattern earns its keep when the prompt:
- Gets used more than once
- Lives in a product feature
- Feeds another program (structured output)
- Has a non-obvious failure mode you've already seen
For anything fitting those criteria, the five-element checklist is the lowest-effort way to make the output predictable.
A tool that does this for you
If you have a draft prompt and you want it restructured into this pattern without writing the five sections by hand, our prompt refiner does exactly that. Paste a rough draft, pick a tone and target model, get back a structured version with the five elements named. Local-only — your prompt isn't sent to a server.
Further reading
- Token counter — once your prompt has structure, budget the tokens before you ship it.
- MCP servers — for prompts that need to reach external systems (GitHub, Postgres, Slack), the structured prompt pairs with a structured tool-calling layer.
More posts
Why your AI agent costs 10× what you expected
Agents look cheap in the demo and expensive in production. The gap is almost always one of four things — context bloat, retries, tool-call cascades, or the wrong model. Here's the math.
Prompt injection in production: the defenses that work
Most prompt injection mitigations advertised online don't survive contact with a determined adversary. Here are the four that do — used together, not in isolation.
MCP vs function calling: when each one wins
Function calling and MCP solve overlapping problems with different tradeoffs. Here's the decision tree we use — and the costs that bite when you pick wrong.