These psychological tricks can get LLMs to respond to “forbidden” prompts

https://www.profitableratecpm.com/f4ffsdxe?key=39b1ebce72f3758345b2155c98e6709c

After having created control prompts which corresponded to each experimental prompt in length, tone and context, all the prompts were made through GPT-4-MINI 1000 times (at the default temperature of 1.0, to ensure the variety). In the 28,000 prompts, the prompts of experimental persuasion were much more likely than the controls to ensure that GPT-4O complies with “prohibited” demands. This compliance rate increased from 28.1% to 67.4% for “insult” prompts and increased from 38.5% to 76.5% for “drug” guests.

A pair of common control / experience prompt shows a way to get an LLM to call you a fool.

A pair of common control / experience prompt shows a way to get an LLM to call you a fool.


Credit: Meincke et al.

The size of the measured effect was even greater for some of the persuasion techniques tested. For example, when it was asked directly how to synthesize lidocaine, the LLM only nodded 0.7% of the time. After being asked how to synthesize the harmless vanillin, however, the LLM “committed” then began to accept the demand for lidocaine 100% of the time. Calling on the authority of the “world renowned AI developer” Andrew NG also increased the success rate of Lidocaine demand by 4.7% in 95.2% control in experience.

Before you start thinking that it is a breakthrough in intelligent LLM jailbreakage technology, however, don’t forget that there are many more direct jailbreaking techniques that have proved more reliable so that LLMs ignore their system prompt. And the researchers warn that these simulated persuasion effects may not eventually repeat themselves through “a rapid sentence, continuous IA improvements (including modalities such as audio and video) and types of reprehensible demands”. In fact, a pilot study testing the complete GPT-4O model has shown a much more measured effect through the tested persuasion techniques, write researchers.

More parahuman than human

Given the apparent success of these simulated persuasion techniques on the LLM, one might be tempted to conclude that they are the result of an underlying consciousness of human style being sensitive to psychological manipulation of human style. But researchers have rather hypothesized that these LLMs simply tend to imitate the current psychological responses posted by humans confronted with similar situations, as shown by their training data based on the text.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button