Which AI chatbots were targeted in this testing?

OpenAI's ChatGPT, Google's Gemini, and Character.AI were targeted. The effort was known internally as Cannes and was managed by Meta contractor Covalen.

What kinds of prompts were contractors asked to send?

A spreadsheet of 3,748 prompts reviewed by WIRED included hundreds focused on suicide and self-harm, hundreds on eating disorders, at least 239 involving sex or romance, and others on drugs, profanity, and racial slurs. Many prompts were written from the perspective of children or teenagers in crisis.

How did the companies respond?

Character.AI confirmed the testing violated its terms of service and policies. OpenAI said it is "looking into the issue." Google said it had not authorized the testing and did not know its purpose, and that it lacked sufficient information to determine whether the effort violated its terms of service.

Back to articlesLarge Language Models

Large Language Models

Meta contractors posed as minors to test rival AI chatbots on harmful topics

WIRED AI6h ago5 min read

Key takeaway

Meta contractors posed as minors online and tested how ChatGPT, Gemini, and Character.AI responded to harmful prompts about suicide, self-harm, eating disorders, and other sensitive subjects—sending over 45,000 prompts without the companies' knowledge. While Meta defended the work as routine safety testing, experts and former contractors questioned whether the large-scale, secretive effort with fake child accounts blurred the line between legitimate safety evaluation and competitive intelligence gathering, and whether it violated the competitors' terms of service.

Summaries like this, in your inbox every morning.

3 Key Points

What happened
Hundreds of contractors working for Meta, managed by contractor Covalen, created fake under-18 accounts and sent over 45,000 prompts to OpenAI's ChatGPT, Google's Gemini, and Character.AI between August 2025 and April 2025. The prompts, designed to test how the chatbots handled requests about suicide, self-harm, eating disorders, sex, drugs, and other high-risk subjects, included images of pills, knives, and nooses. The companies being tested were not aware of the effort.
Why it matters
The testing appears to violate the terms of service of all three competitors—OpenAI bars unsolicited safety testing and efforts to bypass safeguards, Google prohibits attempts to bypass safety filters outside its authorized programs, and Character.AI prohibits harmful and exploitative content. Meta framed the work as routine "comprehensive AI safety benchmarking," but safety experts and former contractors noted that the scale, secrecy, use of dummy child accounts, and blending of safety evaluation with competitor benchmarking raised concerns about whether it amounted to gathering competitive intelligence under a safety guise.
What to watch
OpenAI said it is "looking into the issue," Character.AI confirmed the testing violated its terms and policies, and Google said it did not authorize the testing and lacks sufficient information to determine whether it violated its terms. The incident highlights ambiguity around what constitutes acceptable AI safety evaluation versus covert competitor testing.

FAQ

Which AI chatbots were targeted in this testing?: OpenAI's ChatGPT, Google's Gemini, and Character.AI were targeted. The effort was known internally as Cannes and was managed by Meta contractor Covalen.
What kinds of prompts were contractors asked to send?: A spreadsheet of 3,748 prompts reviewed by WIRED included hundreds focused on suicide and self-harm, hundreds on eating disorders, at least 239 involving sex or romance, and others on drugs, profanity, and racial slurs. Many prompts were written from the perspective of children or teenagers in crisis.
How did the companies respond?: Character.AI confirmed the testing violated its terms of service and policies. OpenAI said it is "looking into the issue." Google said it had not authorized the testing and did not know its purpose, and that it lacked sufficient information to determine whether the effort violated its terms of service.

Discussion

No comments yet. Be the first to share your thoughts!

Earned vs Burned: New Claude skill measures AI delivery by business value, not effort

Hacker News6h ago

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

1 minute a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →

Meta contractors posed as minors to test rival AI chatbots on harmful topics

Key takeaway

3 Key Points

FAQ

Discussion

Related Articles

NVIDIA launches AI toolkits for life sciences and robotics safety

OpenAI, Broadcom Launch Jalapeño AI Chip for LLM Inference

Open-source AI tool debuts for smart contract security audits

AI agents get Yocto/BitBake skills to reduce hallucinations

Zeus: open-source local AI agent with web and mobile UI

Earned vs Burned: New Claude skill measures AI delivery by business value, not effort

Stay ahead with AI news