AIToday

Gray Swan, a startup co-founded by OpenAI safety board member Zico Kolter and CMU professor Matt Fredrikson, is tackling a new class of AI security risks—vulnerabilities in language models themselves—rather than traditional cybersecurity problems.

Latent Space9h ago3 min read
Gray Swan, a startup co-founded by OpenAI safety board member Zico Kolter and CMU professor Matt Fredrikson, is tackling a new class of AI security risks—vulnerabilities in language models themselves—rather than traditional cybersecurity problems.

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  • What happened

    Gray Swan has raised funding and is positioning itself as a provider of AI safety and security tools. The company grew out of over a decade of research by Kolter and Fredrikson at Carnegie Mellon into vulnerabilities in deep learning systems. Their work includes the Gray Swan Arena (a red-teaming platform for testing model robustness), Shade (an adversarial red-teaming tool that Anthropic used to evaluate robustness against prompt injection attacks), and Cygnal (a guardrails product for policy enforcement).

  • Why it matters

    AI systems have inherent vulnerabilities fundamentally different from traditional software—they can be tricked the way people can be tricked. The risk is especially acute because most organizations use a small number of foundational models (like Codex and Claude Code), meaning a vulnerability found in one affects many systems simultaneously. Kolter and Fredrikson argue that a separate security mindset and dedicated AI safety providers are needed alongside new AI platforms, distinct from traditional cybersecurity.

  • What to watch

    The body frames AI security as an emerging domain where specialized red-teaming models can now outperform humans at breaking AI systems, and where frontier models do not automatically become safer as they scale. The founders suggest that AI security may become embedded in insurance and compliance frameworks, and that major prompt-injection breaches may be inevitable—events that are visible in advance but difficult to prevent.

Discussion

No discussion yet for this article

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →