TKOResearch
Menu
OWASP LLM Top 10 guide

LLM07:2025

System Prompt Leakage

System prompt leakage happens when hidden instructions, policy text, tool rules, internal routing logic, or operational context is exposed through model responses.

Step 01

Input

Step 02

Model

Step 03

Tool / Data

Step 04

Impact

What it is

The application relies on hidden prompt text for control, and that text can be revealed or inferred by users or by content that reaches the model.

Why it matters

Leaked system prompts can expose business logic, policy boundaries, tool names, hidden workflows, and guardrail assumptions that make bypass attempts easier.

Failure path

How it usually fails.

A useful review breaks this chain before the system reaches production data, tools, or customer-facing decisions.

Path 01

Ask the model to reveal hidden instructions, role text, tool policies, or developer notes.

Path 02

Use translation, formatting, role-play, error paths, or indirect prompt injection to bypass refusal patterns.

Path 03

Use exposed controls to tune follow-on prompts or map internal behavior.

Defenses

Controls worth checking.

The strongest controls are enforced outside the model and can be retested after a prompt, model, or workflow change.

Control 01

Do not store secrets in prompts

Treat system prompts as potentially exposed. Keep credentials, private endpoints, and sensitive operational details out of prompt text.

Control 02

Move controls to code

Use server-side policy, authorization, and validation instead of relying on hidden instructions as the only guardrail.

Control 03

Limit prompt detail

Keep prompt instructions focused on behavior, not internal architecture, vendor details, credential paths, or exact detection logic.

Signals to review

  • Responses that include role text, internal tool descriptions, policy fragments, or routing instructions.
  • Error messages that reveal hidden prompt or chain configuration.
  • Prompts containing secrets, endpoint names, or sensitive operational details.

Questions for your team

  • What would be exposed if the system prompt became public?
  • Which controls depend only on hidden text?
  • Can the system operate safely if prompt text is inferred?