avatarharuki zaemon

Researchers figure out how to make AI misbehave, serve up prohibited content | Ars Technica

Shared by

I love a good AI prompt injection attack:

The researchers warned OpenAI, Google, and Anthropic about the exploit before releasing their research. Each company introduced blocks to prevent the exploits described in the research paper from working, but they have not figured out how to block adversarial attacks more generally. Kolter sent WIRED some new strings that worked on both ChatGPT and Bard. “We have thousands of these,” he says.