Insert hostile, misleading, or policy-changing content into data the model trains on, retrieves from, or stores as memory.
LLM04:2025
Data and Model Poisoning
Data and model poisoning happens when training, fine-tuning, RAG, feedback, or memory data is manipulated so the AI system learns or retrieves attacker-shaped behavior.
Step 01
Input
Step 02
Model
Step 03
Tool / Data
Step 04
Impact
What it is
The system does not adequately control the integrity of data that shapes model behavior, retrieval results, feedback loops, fine-tuning sets, or long-term memory.
Why it matters
Poisoned data can degrade decisions quietly, steer agents toward unsafe actions, pollute customer answers, and create recurring failures that look like model quality problems.
Failure path
How it usually fails.
A useful review breaks this chain before the system reaches production data, tools, or customer-facing decisions.
Cause the model to reuse that material in future conversations or workflows.
Exploit the poisoned behavior after it becomes part of normal system state.
Defenses
Controls worth checking.
The strongest controls are enforced outside the model and can be retested after a prompt, model, or workflow change.
Classify source trust
Label training, fine-tuning, RAG, feedback, and memory sources by owner, trust level, freshness, and review status.
Gate ingestion
Scan and approve high-impact corpus changes, quarantine untrusted content, and maintain rollback paths for bad updates.
Monitor behavioral drift
Use regression suites and answer-quality checks to catch new unsafe behavior after data, model, or retrieval updates.
Signals to review
- New instructions embedded inside retrieved content or long-term memory.
- Answer drift after dataset, embedding, fine-tuning, or corpus updates.
- User feedback loops that can directly rewrite future behavior.
Questions for your team
- Which data sources can change model behavior over time?
- Can untrusted users influence memory, RAG, or fine-tuning inputs?
- How quickly can a bad corpus or model update be rolled back?
