May 1, 20265 min readllms / prompt engineering / fundamentals

The five-element prompt: a checklist for prompts that don't drift

Most production prompts fail for one reason: missing structure. Here's the five-element checklist that turns a sloppy draft into one that ships.

The gap between a prompt that works on three examples and a prompt that works in production is wider than it looks. Most of that gap comes down to structure. A prompt that names exactly what it wants and bounds what it doesn't will outperform one that hopes for the best — by a lot, across every modern LLM.

Here's the five-element checklist we run through whenever a prompt is going into anything bigger than a one-off chat.

1. Role

Tell the model who it's being. Not metaphorically — explicitly.

You are a senior backend engineer reviewing pull requests for race conditions.

That's better than "please review this code" because it primes the model into a specific knowledge subset. The same model, asked to review code without a role, gives generic feedback. With a role, it focuses on the exact failure mode you named.

The role should be narrow. "You are an expert in X" is fine. "You are an expert in everything" is not — broad roles flatten the response.

2. Context

What does the model need to know to do this well? Not everything you know — the relevant subset.

Context:
- This is a Node.js service using TypeORM against a Postgres 16 database.
- Concurrency is high — 200-500 requests per second per pod.
- We've previously had bugs where uniqueness constraints fired under race conditions because the read-then-write pattern doesn't use a transaction.

Context matters more for code review and analysis than for content generation. For a creative task, "this is for an internal team newsletter" is enough. For a debugging task, you want the constraints and the prior failure modes.

3. Task

A clear directive. Not "help me with this" — a specific verb.

Task: review the diff below for potential race conditions or data integrity issues.

The verb is doing work. "Review" is different from "summarise" is different from "fix." Pick one and commit to it. If you want multiple things, ask for them in sequence — one task per call gives crisper output than a kitchen-sink request.

4. Constraints

Things the model must avoid, must include, or must obey. This is where most prompts leak quality.

Constraints:
- Only flag issues you can defend with a specific scenario; no speculative concerns.
- Prefer transactional fixes over locking; we're trying to reduce lock contention.
- Do not propose schema changes — the schema is out of scope for this review.

Constraints prevent drift. Without them, the model fills its response with hedges, caveats, and tangents. With them, the model has guardrails — and the output stays inside the lane you wanted.

5. Output format

Be explicit. Markdown sections, JSON schema, plain text only, list of N items, max word count — whatever you want, name it.

Output:
- A markdown list of issues found.
- Each issue: severity (CRITICAL/HIGH/MEDIUM), one-sentence summary, code snippet illustrating the race, recommended fix.
- If no issues found, say "No race conditions detected" and stop.

For prompts that produce structured data (anything that feeds another program), output format is the single most important element. JSON schema, regex pattern, explicit field list — name it precisely. Modern models like Claude Sonnet 4.6 and GPT-4o follow output format constraints reliably when they're stated up front.

Why this works

The five-element pattern works for the same reason a good function signature works: clarity of contract. The model knows who it's being, what it's working with, what it's doing, what it can't do, and what shape the answer takes. Every degree of freedom you remove from the input is a degree of variance you remove from the output.

This isn't theoretical. The same prompt with these five elements named explicitly, versus the same prompt without, produces measurably different outputs across every modern LLM — Claude, GPT-4, Gemini, Llama 3, DeepSeek. The lift compounds when the prompt gets reused thousands of times (as it does in any product feature).

When not to bother

Conversational chat doesn't need this. If you're asking the model "what's a good restaurant near me" or "explain transformer architecture," structuring the prompt with role/context/task/constraints/output is overkill — the cost outweighs the benefit.

The pattern earns its keep when the prompt:

Gets used more than once
Lives in a product feature
Feeds another program (structured output)
Has a non-obvious failure mode you've already seen

For anything fitting those criteria, the five-element checklist is the lowest-effort way to make the output predictable.

A tool that does this for you

If you have a draft prompt and you want it restructured into this pattern without writing the five sections by hand, our prompt refiner does exactly that. Paste a rough draft, pick a tone and target model, get back a structured version with the five elements named. Local-only — your prompt isn't sent to a server.