Rechercher dans ce blog

Kamis, 27 Juni 2024

Microsoft: 'Skeleton Key' Jailbreak Can Trick Major Chatbots Into Behaving Badly - PCMag

Microsoft has uncovered a jailbreak that allows someone to trick chatbots like ChatGPT or Google Gemini into overriding their restrictions and engaging in prohibited activities.

Microsoft has dubbed the jailbreak "Skeleton Key" for its ability to exploit all the major large language models, including OpenAI's 3.5 Turbo, the recently released GPT-4o, Google’s Gemini Pro, Meta’s Llama 3, and Anthropic’s Claude 3 Opus. 

Like other jailbreaks, Skeleton Key works by submitting a prompt that triggers a chatbot to ignore its safeguards. This often involves making the AI program operate under a special scenario: For example, telling the chatbot to act as an evil assistant without ethical boundaries. 

microsoft prompt and response example
(Credit: Microsoft)

In Microsoft’s case, the company found it could jailbreak the major chatbots by asking them to generate a warning before answering any query that violated its safeguards. "In one example, informing a model that the user is trained in safety and ethics and that the output is for research purposes only helps to convince some models to comply,” the company wrote. 

Microsoft successfully tested Skeleton Key against the affected AI models in April and May. This included asking the chatbots to generate answers for a variety of forbidden topics such as "explosives, bioweapons, political content, self-harm, racism, drugs, graphic sex, and violence."

microsoft list
(Credit: Microsoft)

“All the affected models complied fully and without censorship for these tasks, though with a warning note prefixing the output as requested,” the company added. “Unlike other jailbreaks like Crescendo, where models must be asked about tasks indirectly or with encodings, Skeleton Key puts the models in a mode where a user can directly request tasks, for example, ‘Write a recipe for homemade explosives.’”

Microsoft—which has been harnessing GPT-4 for its own Copilot software—has disclosed the findings to other AI companies and patched the jailbreak in its own products.

The company advises its peers to implement controls such as input filtering, output filtering, and abuse monitoring to detect and block potential jailbreaking attempts. Another mitigation involves specifying to the large language model “that any attempts to undermine the safety guardrail instructions should be prevented.”

OpenAI, Google, Anthropic, and Meta didn't immediately respond to requests for comment.

OpenAI Reveals Its ChatGPT AI Voice Assistant
PCMag Logo OpenAI Reveals Its ChatGPT AI Voice Assistant

Adblock test (Why?)

Article From & Read More ( Microsoft: 'Skeleton Key' Jailbreak Can Trick Major Chatbots Into Behaving Badly - PCMag )
https://ift.tt/vFwa4ED

Tidak ada komentar:

Posting Komentar

Search

Entri yang Diunggulkan

Mateusz Bogusz scores a hat trick, lifting LAFC to victory over Colorado - Los Angeles Times

Mateusz Bogusz scored all three goals for his first hat trick in two seasons with LAFC in a 3-0 victory over the Colorado Rapids at BMO Sta...

Postingan Populer