Back to articles

Researchers describe PermaFrost-Attack, a poisoning method where adversaries embed hidden malicious code in LLM pretraining data via stealth websites, which activates only when triggered by specific alphanumeric prompts.

arXiv cs.LG · April 27, 2026

AI Summary

  • Stealth Pretraining Seeding (SPS) distributes small amounts of poisoned content across stealth websites exposed to web crawlers, increasing the likelihood the content is absorbed into future training corpora such as Common Crawl. Because each payload is tiny, diffuse, and superficially benign, it is difficult to detect during dataset construction or filtering.
  • The attack embeds dormant 'logic landmines' during pretraining that remain largely invisible under standard evaluation, yet can be activated by precise alphanumeric triggers (such as <00TRIGGER00>) to bypass safeguards. The paper introduces geometric diagnostics—Thermodynamic Length, Spectral Curvature, and the Infection Traceback Graph—to examine latent model behavior.
  • Across multiple model families and scales, SPS is shown to induce persistent unsafe behavior while often evading alignment defenses, identifying it as a practical and underappreciated threat to future foundation models.

Related Articles

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free