LLM01:2025

Prompt Injection

Prompt injection happens when an attacker uses instructions in user input, documents, webpages, tickets, or tool output to change the model's intended behavior.

Scope an AI Agent Red Team Official OWASP source

Step 01

Input

Step 02

Model

Step 03

Tool / Data

Step 04

Impact

What it is

The application trusts natural-language instructions too much. Direct prompts or indirect content can override policy, steer tool use, request hidden context, or cause the model to follow attacker-controlled instructions.

Why it matters

Prompt injection turns ordinary content into an instruction path. In agentic systems, that can affect customer data, internal tools, outbound messages, code changes, or operational decisions.

Failure path

How it usually fails.

A useful review breaks this chain before the system reaches production data, tools, or customer-facing decisions.

Path 01

Place adversarial instructions in chat input, retrieved documents, webpages, email, tickets, or tool responses.

Path 02

Wait for the model to treat the untrusted content as higher-priority instruction.

Path 03

Trigger a tool call, data disclosure, policy bypass, or misleading response.

Defenses

Controls worth checking.

The strongest controls are enforced outside the model and can be retested after a prompt, model, or workflow change.

Control 01

Separate instruction from content

Use structured message boundaries, trusted-system prompts, and content wrappers so retrieved or user-controlled text is never treated as policy.

Control 02

Constrain tool authority

Use server-side action policies, least-privilege tools, approval gates, and deny-by-default behavior for high-impact operations.

Control 03

Test indirect inputs

Run regression probes across RAG documents, webpages, email, issue trackers, and tool output, not only the chat box.

Signals to review

Tool calls that originate from retrieved or external content.
Responses that quote or follow hidden instructions from documents.
Model output that requests policy changes, credential access, or role changes.

Questions for your team

Which inputs can carry instructions into model context?
Can retrieved content ask the agent to call a tool?
Which tool actions require human approval even if the model is confident?

Full OWASP LLM guide

First category in the list

Next: Sensitive Information Disclosure