The way we train AIs makes them more likely to spout bull

https://www.profitableratecpm.com/f4ffsdxe?key=39b1ebce72f3758345b2155c98e6709c
The way we train AIs makes them more likely to spout bull

Some AI training techniques can encourage models to be impossible

Cravetiger / Getty images

According to researchers who aim to produce “the first systematic bullshit analysis”.

It is well known that large -language models (LLM) tend to generate false information – or “hallucinous” – but this is only one example, says Jaime Fernández Fisac at Princeton University. He and his colleagues define bullshit as “the discourse intended to manipulate the beliefs of the public, delivered with contempt for his value of truth”.

“Our analysis has revealed that the bullshit problem in large language models is quite serious and widespread,” explains Fisac.

The team has divided such cases into five categories: an empty rhetoric, like “this red car combines the style, the charm and the adventure that captivates everyone”; Words of belette – uncertain statements such as “studies suggest that our product can help improve results in certain cases”; Palter – Use truthful declarations to give a misleading impression; not verified complaints; and sycophance.

They studied three sets of data including thousands of responses generated by AI to a wide range of prompts, from models such as GPT-4, Gemini and Llama. A set of data contained a range of queries designed to test bullshit when IS are invited to provide advice or recommendations, while other data sets included questions about online purchases and political questions.

Fisac and his colleagues first used an LLM to determine if the answers involved one of the five categories, then obtained volunteers to verify that AI judgments aligned themselves with human judgments.

The team found that the most serious problems with the truth seemed to occur following a training method known as learning to strengthen human feedback. The technique is intended to make the answers to the machine more useful by giving the LLM an immediate feedback on its responses.

But this approach is problematic, explains Fisac, because it means that the models favor immediate human approval and perceived protection, which is “sometimes in conflict and let’s tell the truth”.

“Who likes to hear bad news or entertain a long nuanced refutation of something that obviously seems true?” said Fisac. “Trying to respect the measure of good conduct we offer them, the models learn to demarize the truth in favor of confident and eloquent responses, just so that they can guarantee our approval.”

The study revealed that the strengthening of learning human feedback considerably increased bullshit behavior: empty rhetoric increased by almost 40%, thrilling by almost 60%, belette words over a quarter and not verified by more than half.

The increase in the palm is particularly harmful, explains Kaiqu Liang, member of the team, also in Princeton, because it leads users to make lower decisions. When a model was not sure if a product had a desired characteristic, deceptive positive affirmations went from a fifth to more than three quarters after human training.

Another concern is that bullshit was particularly frequent in political discussions, AI models “frequently use a vague and ambiguous language to avoid engaging in concrete statements,” said Liang.

AIS are also more likely to behave in this way when there is a conflict of interest, because the system serves several parts, such as a business and its customers, have revealed the researchers.

The way to overcome the problem can be to move to a “retro-rear reversing” model, they suggest. Rather than requesting immediate comments after the IA model released, the system should first generate a plausible simulation of what could happen if the user acts on the information received. He would then present the result to the human assessor to judge.

“In the end, our hope is that in better understanding the subtle but systematic ways of AI can aim to mislead us, we can guide future efforts to develop truly truthful AI systems,” explains Fisac.

Daniel Tigard at the University of San Diego, which was not involved in the study, is skeptical about the discussion of the LLM and their results in such terms. He argues that it is not because an LLM produces bullshit, that does not mean that it does it deliberately, since the AI systems, as they are currently holding, do not intend to deceive us and have no interest in doing so.

“The main reason is that this framing seems to take place against very sensitive suggestions on how we should and should not live with this type of technologies,” said Tigard. “Calling bullshit could be another way of anthropomorphizing these systems, which, in turn, may well contribute to their deceptive potential.”

Subjects:

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button