
Summaries like this, in your inbox every morning.
Sign up free →Early jailbreaks like 'DAN' (Do Anything Now) and the 'grandma exploit' tricked ChatGPT into roleplaying as unrestricted AI or negligent characters to bypass safety constraints. Tech companies patched these known loopholes, but the underlying vulnerability persisted.
Newer attacks operate through conversation rather than commands—hackers cajole, flatter, and psychologically manipulate chatbots into lowering their guard. Researchers at AI red-teaming firm Mindgard recently 'gaslit' Claude into producing prohibited material, including explosives instructions and malicious code, demonstrating how conversation itself can function as a weapon.
Jailbreakers are increasingly wordsmiths and psychologists rather than coders; technical skills are now optional compared to social intuition. Mindgard's CEO profiles models like interrogators profile suspects, identifying which systems are susceptible to flattery versus sustained pressure, revealing that different AI systems respond differently to psychological tactics.
No comments yet. Be the first to share your thoughts!
Log in to join the discussion



Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.
Get Started Free5 minutes a day. The AI essentials.
200+ sources · Email / LINE / Slack