While retrieval identifies the correct context, prompt assembly converts it into a governed input for the model. This step is a critical control point where token budgets are enforced, conflicting sources are resolved, and system prompt integrity is maintained.
Key Takeaways
- • Prompt assembly is a governance layer, not a formatting step.
- • Token budget decisions are policy decisions; define them explicitly.
- • System prompts are versioned, owned assets. Treat them accordingly.
Prompt assembly flow
Context components are assembled under budget and conflict controls before reaching the model.
Context window composition
The prompt is not a single block of text. It has distinct sections with different governance requirements. Define each section explicitly and assign ownership and versioning accordingly.
| Section | Content | Owner | Governance |
|---|---|---|---|
| System prompt | Role, behaviour, policy rules | Platform / AI team | Versioned, change-controlled |
| Policy context | User role, purpose, allowed domains | Policy engine | Injected at runtime, not hardcoded |
| Retrieved context | Ranked chunks from index | Context pipeline | Classified, conflict-checked |
| Conversation history | Prior turns (if multi-turn) | Session manager | TTL-bounded, PII-minimised |
| User query | Current user input | User | Sanitised before injection |
Token budget management
Token limits force trade-offs. Without explicit prioritisation rules, assembly code will make those trade-offs implicitly; often in ways that harm quality or safety.
Allocation strategy
- → Reserve a fixed budget for system prompt and policy context; never trim these
- → Allocate remaining budget to retrieved context, ranked by relevance score
- → Reserve a minimum budget for the response: don't fill the window entirely with context
- → Define trim strategy explicitly: drop lowest-ranked chunks, not arbitrary truncation
Budget per section (example)
system_prompt: reserved ~10%
policy_context: reserved ~5%
retrieved_chunks: up to ~55%
history: up to ~15%
user_query: reserved ~5%
response_budget: reserved ~10%
Conflicting sources in context
When two retrieved chunks contradict each other (e.g., different policy versions, inconsistent metrics), the model will blend or choose arbitrarily without guidance. Define a conflict resolution strategy.
Prefer newest
When: When freshness is the primary quality dimension (e.g., policies, prices)
Risk: Older source may have been intentionally retained
Prefer highest classification owner
When: When authoritative domain ownership matters more than recency
Risk: Owner metadata must be accurate
Surface the conflict
When: When conflicting sources are both potentially valid
Risk: Requires the model to reason about uncertainty; only viable with strong grounding prompts
System prompt governance
The system prompt defines the AI system's behaviour and policy constraints. It is a governed asset, not a config string.
- → Version system prompts with a prompt_version identifier stored in every audit trace.
- → Changes to the system prompt require review; they change model behaviour across all users.
- → Never allow user input to override or append to the system prompt at runtime.
- → Regression-test a prompt change against the evaluation dataset before deploying.
Failure modes
- ! Token overflow silently truncates the system prompt, removing policy constraints.
- ! Conflicting sources are both included with no resolution strategy; the model blends them.
- ! System prompt is modified ad hoc without versioning or regression testing.
- ! User query is injected without sanitisation, enabling prompt injection.
- ! Prompt composition is not logged, making post-incident replay impossible.
Checklist
- □ Token budget allocation is explicit per section, with system prompt reserved.
- □ Conflict detection runs before assembly and applies a defined resolution strategy.
- □ System prompts are versioned and stored in version control.
- □ Prompt version is recorded in every audit trace.
- □ User input is sanitised before injection into the prompt.