Researchers describe PermaFrost-Attack, a poisoning method where adversaries embed hidden malicious code in LLM pretraining data via stealth websites, which activates only when triggered by specific alphanumeric prompts.

arXiv cs.LG · 2026年4月27日

AI要約

•Stealth Pretraining Seeding (SPS) distributes small amounts of poisoned content across stealth websites exposed to web crawlers, increasing the likelihood the content is absorbed into future training corpora such as Common Crawl. Because each payload is tiny, diffuse, and superficially benign, it is difficult to detect during dataset construction or filtering.
•The attack embeds dormant 'logic landmines' during pretraining that remain largely invisible under standard evaluation, yet can be activated by precise alphanumeric triggers (such as <00TRIGGER00>) to bypass safeguards. The paper introduces geometric diagnostics—Thermodynamic Length, Spectral Curvature, and the Infection Traceback Graph—to examine latent model behavior.
•Across multiple model families and scales, SPS is shown to induce persistent unsafe behavior while often evading alignment defenses, identifying it as a practical and underappreciated threat to future foundation models.

200以上のソースから厳選したAIニュースを毎日無料でお届けします。