Send inputs that maximize tokens, trigger repeated retries, create long tool loops, or fan out into many model calls.
LLM10:2025
Unbounded Consumption
Unbounded consumption occurs when attackers or faulty workflows drive excessive model calls, token usage, context growth, tool loops, compute, cost, or availability impact.
Step 01
Input
Step 02
Model
Step 03
Tool / Data
Step 04
Impact
What it is
The application does not enforce practical budgets, rate limits, loop controls, queue limits, tenant quotas, or graceful degradation for AI workloads.
Why it matters
Unbounded consumption can create runaway cloud bills, service instability, degraded customer experience, queue exhaustion, and denial of service against high-cost AI paths.
Failure path
How it usually fails.
A useful review breaks this chain before the system reaches production data, tools, or customer-facing decisions.
Exploit missing tenant quotas, caching, circuit breakers, or workflow ceilings.
Drive cost, latency, or availability impact before operators can respond.
Defenses
Controls worth checking.
The strongest controls are enforced outside the model and can be retested after a prompt, model, or workflow change.
Set budgets and ceilings
Limit tokens, context size, tool iterations, retries, runtime, queue depth, and tenant-level spend for each workflow.
Rate-limit expensive paths
Apply rate limits, abuse detection, caching, and backpressure to AI endpoints, RAG queries, and agent loops.
Fail safely
Use circuit breakers, partial-response behavior, and clear user messaging when an AI workflow hits limits.
Signals to review
- Unexpected spikes in token usage, retries, tool loops, queue depth, or model cost.
- Long-running requests with no workflow ceiling.
- One user, tenant, or job consuming disproportionate AI capacity.
Questions for your team
- What is the maximum cost of one request, one user, and one tenant?
- Can an agent loop indefinitely through tools or retries?
- What happens when model, vector, or tool providers slow down?
