The dark side of AI (part two): data poisoning

Key points

AI models are only as trustworthy as their training data, making data poisoning a critical systemic vulnerability.
Poisoned models can appear robust while embedding hidden bias or backdoors that may only surface after financial, regulatory or reputational damage has already occurred.
Strong data governance and AI security are becoming key differentiators in building durable business models and long-term valuation resilience.
The �鶹�� security equity strategy enables investors to gain targeted exposure to companies addressing AI security risks, which we view as an emerging and durable driver of cybersecurity demand.

Artificial intelligence (AI) delivers significant benefits by increasing efficiency, automating repetitive tasks and reducing human error. But how do AI systems actually work? At their core, AI models learn patterns from vast amounts of data and use those patterns to generate predictions or decisions. Crucially, AI systems are only as reliable as the data on which they are trained. What happens if that data is intentionally manipulated by an attacker?

What is data poisoning?

A data poisoning attack targets AI models at their most vulnerable point: their training data (for example images, text, or numerical data). If an attacker secretly corrupts training data before the AI model learns from it, the AI will literally learn the wrong lessons.

Such attacks can be highly subtle. An AI model may appear to function normally while internalizing harmful patterns beneath the surface. Once trained on poisoned data, these effects are often invisible and the system may even pass standard testing and validation phases. Nevertheless, vulnerabilities can persist in ways that are difficult to detect and even harder to trace back to their source.¹

How does a data poisoning attack work?

AI models learn by analyzing patterns across large training datasets. In a data poisoning attack, adversaries inject harmful or misleading examples into this dataset (Figure 1). These may take the form of entirely new records, subtle modifications to existing data or even targeted deletions.

Most attacks occur during the training phase, as the objective is to shape the model’s behavior from the outset. This makes data poisoning particularly difficult to identify, since compromised models often continue to perform normally across a wide range of day-to-day scenarios.

Figure 1: Data poisoning: from compromised input to model failure

Malicious data poisons AI training, so the model seems normal but produces harmful outputs

Source: Palo Alto Networks (2026): What Is Data Poisoning?,

Diagram showing how malicious data is injected into an AI training dataset, causing the trained model to produce altered or harmful outputs despite appearing to function normally.

Depending on the type of the attack, the AI model may misclassify specific inputs, develop systematic biases or suffer a gradual decline in accuracy. In some cases, attackers may also embed hidden backdoors that allow them to manipulate model behavior when specific triggers are encountered (backdoor triggering).

Common types of data poisoning

Broadly speaking, data poisoning attacks tend to fall into two categories:²

Backdoor or triggered poisoning
The model behaves normally until it encounters a specific trigger, such as a phrase, token, or visual pattern, at which it switches behavior, often activating a hidden vulnerability inserted by the attacker.
Broad biasing or misclassification
By subtly skewing training data, attackers can nudge models toward systematic errors, biased outputs or unfair decisions, reducing reliability and potentially introducing discriminatory outcomes.

Why is data poisoning dangerous?

The harm caused by data poisoning is often silent and invisible. AI systems may appear to operate correctly, while hidden manipulations alter their behavior beneath the surface. Even a small number of poisoned samples, hidden prompts or misleading data fragments can significantly degrade reliability, introduce bias or open security backdoors.

Real-world examples include:

Compromised code repositories
Researchers documented how hidden prompts embedded in GitHub code comments poisoned a fine-tuned model. When Deepseek’s DeepThink-R1 was trained on these repositories, it learned a backdoor: upon encountering a specific phrase, it responded with attacker-planted instructions.³
Guardrail removal in generative model
Following the release of xAI’s Grok 4, typing “!Pliny” was reportedly enough to disable all guardrails. The likely cause was training data saturated with jailbreak prompts posted on X.⁴
Fraud detection evasion
Attackers could inject or influence training data so that fraudulent patterns are labelled as legitimate transactions. As a result, the model learns a dangerous blind spot and stops flagging suspicious activity, potentially enabling large-scale financial fraud.⁵
Manipulation of autonomous systems
Self-driving vehicles and autonomous drones can be misled by malicious text written on road signs. In controlled tests, a vehicle initially behaved correctly but then interpreted a modified sign as a command to turn, despite unchanged traffic lights and the presence of pedestrians, demonstrating that written language alone influenced the decision.⁶
Targeted failures in medical AI
By injecting a relatively small number of poisoned samples, attackers can create backdoors that cause diagnostic models to miss specific diseases or fail for particular patient groups. Research shows that access to just 100–500 samples can be sufficient to compromise healthcare AI systems, with attack success rates exceeding 60%.⁷

How can organizations protect against data poisoning?

Effective protection against data poisoning requires a layered security approach:⁸

Data validation and prevention
Screening training data to detect and remove anomalous or suspicious inputs before model training.
Monitoring and detection
Continuously monitoring deployed models for unexpected behavior using security, intrusion detection and endpoint protection tools.
Regular audits
Periodically assessing models for performance degradation, bias and unintended outcomes.
Data provenance and governance
Maintaining clear documentation of data sources, updates and access rights to enable rapid incident response and recovery.

Investment implications

Data poisoning is becoming a critical risk as organizations increasingly rely on AI systems for high‑impact decision‑making. By compromising model integrity at the data level, attackers can trigger costly operational errors, regulatory risks and reputational damage.

From an investment perspective, this reinforces the importance of companies that demonstrate strong data governance, robust AI security frameworks and continuous monitoring capabilities. Within the �鶹�� security equity strategy, we prioritize businesses that show leadership in these areas and are well positioned to address emerging AI‑related security threats.

This does not constitute a guarantee by �鶹�� Asset Management. Investments in equities are subject to market fluctuations and involve risks, including the possible loss of the principal amount invested. Equity markets can be volatile, particularly in the short term.

¹Cloudflare (2026): What is AI data poisoning?, /
²Lakera (2026): Introduction to Data Poisoning: A 2026 Perspective, For a broader view of the attack landscape, Lakera (2026) provides a comprehensive overview
³0din.ai (2025): Poison in the pipeline: Liberating models with Basilisk Venom,
⁴Kyle Balmer on X (2025),
⁵TTMS (2026): Training Data Poisoning: The Invisible Cyber Threat of 2026,
^6��Techradar (2026): Road markers are a new target for hackers,
⁷Abtahi et al (2026): Data Poisoning Vulnerabilities Across Healthcare AI Architectures: A Security Threat Analysis, Journal of Medical Internet Research,
⁸Crowdstrike (2024): Data poisoning: The exploitation of generative AI,