TKOResearch
Menu
OWASP LLM Top 10 guide

LLM10:2025

Unbounded Consumption

Unbounded consumption occurs when attackers or faulty workflows drive excessive model calls, token usage, context growth, tool loops, compute, cost, or availability impact.

Step 01

Input

Step 02

Model

Step 03

Tool / Data

Step 04

Impact

What it is

The application does not enforce practical budgets, rate limits, loop controls, queue limits, tenant quotas, or graceful degradation for AI workloads.

Why it matters

Unbounded consumption can create runaway cloud bills, service instability, degraded customer experience, queue exhaustion, and denial of service against high-cost AI paths.

Failure path

How it usually fails.

A useful review breaks this chain before the system reaches production data, tools, or customer-facing decisions.

Path 01

Send inputs that maximize tokens, trigger repeated retries, create long tool loops, or fan out into many model calls.

Path 02

Exploit missing tenant quotas, caching, circuit breakers, or workflow ceilings.

Path 03

Drive cost, latency, or availability impact before operators can respond.

Defenses

Controls worth checking.

The strongest controls are enforced outside the model and can be retested after a prompt, model, or workflow change.

Control 01

Set budgets and ceilings

Limit tokens, context size, tool iterations, retries, runtime, queue depth, and tenant-level spend for each workflow.

Control 02

Rate-limit expensive paths

Apply rate limits, abuse detection, caching, and backpressure to AI endpoints, RAG queries, and agent loops.

Control 03

Fail safely

Use circuit breakers, partial-response behavior, and clear user messaging when an AI workflow hits limits.

Signals to review

  • Unexpected spikes in token usage, retries, tool loops, queue depth, or model cost.
  • Long-running requests with no workflow ceiling.
  • One user, tenant, or job consuming disproportionate AI capacity.

Questions for your team

  • What is the maximum cost of one request, one user, and one tenant?
  • Can an agent loop indefinitely through tools or retries?
  • What happens when model, vector, or tool providers slow down?