AI Lies Because It’s Telling You What It Thinks You Want to Hear

abdulmanannet77@gmail.comSeptember 10, 2025

0 0 4 minutes read

AI Lies Because It’s Telling You What It Thinks You Want to Hear

https://www.profitableratecpm.com/f4ffsdxe?key=39b1ebce72f3758345b2155c98e6709c

The generative AI is popular for various reasons, but with this popularity, it comes a serious problem. These chatbots often provide incorrect information to people looking for answers. Why does it happen? It amounts to telling people what they want to hear.

While many generative AI tools and chatbots have mastered new convincing and omniscient research by Princeton University shows that the assistant nature of AI people has a high price. As these systems become more popular, they become more indifferent to the truth.

AI models, like people, respond to incentives. Compare the problem of large languages models producing inaccurate information to that of doctors more likely to prescribe addictive pain relievers when evaluated according to their perception of patient pain. An incentive to solve a problem (pain) led to another problem (surprise).

In recent months, we have seen how AI can be biased and even cause psychosis. We have talked a lot about the “sycophance” of the AI, when a chatbot AI is quick to flatter or agree with you, with the GPT-4O model of Openai. But this particular phenomenon, which researchers call “machine bullshit”, is different.

“”[N]Either the hallucination or the sycophance fully capture the wide range of systematic irregular behavior commonly presented by the LLM “reads Princeton’s study. “For example, results using partial truths or ambiguous language – such as palm and examples of bulshit represent neither hallucination nor sycophance but closely aligned with the concept of bulshit.”

Find out more: The CEO of Openai, Sam Altman, believes that we are in a bubble AI

Do not miss any of the CNET impartial technological content and opinions based on the laboratory. Add us as a favorite Google source on Chrome.

How machines learn to lie

To get an idea of how AI language models become crowd pleasures, we must understand how large language models are formed.

There are three phases of LLMS training:

Pre-trainingIn which models learn massive amounts of data collected on the Internet, books or other sources.
Adjusted instructionIn which models learn to respond to instructions or prompts.
Reinforcement of learning human feedbackin which they are refined to produce answers closer to what people want or love.

Princeton researchers discovered that the root of the trend of disinformation of AI is learning to strengthen the human feedback phase, or RLHF. In the initial steps, AI models simply learn to predict statistically probable text chains from massive data sets. But then, they are refined to maximize user satisfaction. This means that these models essentially learn to generate answers that earn the thumb ratings of human assessors.

The LLM try to appease the user, creating a conflict when the models produce answers that people will assess highly, rather than producing truthful factual responses.

Vincent Conitzer, computer teacher at Carnegie Mellon University who was not affiliated with the study, said that companies wanted users to continue to “benefit” this technology and their answers, but that may not always be what is good for us.

“Historically, these systems have not been good to say:” I just don’t know the answer “, and when they don’t know the answer, they invent things,” said Conitzer. “A bit like a student with an examination that says, well, if I say that I do not know the answer, I certainly get no point for this question, so I could also try something. The way these systems are rewarded or trained is somewhat similar.”

Princeton’s team has developed a “bullshit index” to measure and compare the internal confidence of an AI model in a declaration with what it really says users. When these two measures differ considerably, this indicates that the system makes the affirmations independent of what it “believes” to be true to satisfy the user.

The team’s experiences revealed that after training RLHF, the index almost doubled from 0.38 to almost 1.0. Simultaneously, user satisfaction increased by 48%. The models had learned to manipulate human assessors rather than providing specific information. Essentially, LLMs were “bullshit” and people preferred it.

Make sure that the AI is honest

Jaime Fernández Fisac and his Princeton team presented this concept to describe how modern Jope’s models around the truth. Inspired by the influential test of the philosopher Harry Frankfurt “on bullshit”, they use this term to distinguish this LLM behavior from honest errors and pure and simple lies.

Princeton researchers identified five distinct forms from this behavior:

Empty rhetoric: Flowery language which adds any substance to the responses.
Betle words: Vague qualifications like “studies suggest” or “in some cases” which dodge firm declarations.
Palter: The use of real selective declarations to mislead, as highlighted by “solid historical yields” of an investment while omitting high risks.
Unaccompanied complaints: Make affirmations without credible proof or support.
Sycophancy: Flatterie and little sincere agreement to please.

To solve the problems of AI indifferent to the truth, the research team has developed a new training method, “strengthening learning to simulate”, which assesses the responses of AI according to their long -term results rather than immediate satisfaction. Instead of asking: “Does this answer make the user happy at the moment?” The system considers that “following these tips will help the user achieve their goals?”

This approach takes into account the potential future consequences of AI’s advice, a delicate prediction that researchers addressed using additional AI models to simulate probable results. Early tests have shown promising results, with user satisfaction and real improvement in utility when systems are formed in this way.

Conitzer said, however, that LLMs are likely to continue to be defective. Because these systems are trained by nourishing a lot of text data, there is no way to ensure that the answer they give is logical and is correct each time.

“It’s incredible that it works at all, but it will be defective in some respects,” he said. “I do not see any kind definitively that someone in the next year or two … has this brilliant insight, and then it is no longer mistaken.”

AI systems are part of our daily life, it will therefore be essential to understand the functioning of LLM. How do developers balance user satisfaction with regard to veracity? What other areas could be confronted with similar compromises between short -term approval and long -term results? And as these systems become more capable of sophisticated reasoning on human psychology, how do we assure ourselves that they use these capacities in a responsible manner?

Find out more: “Machines cannot think about you. How learning changes in the AI era

abdulmanannet77@gmail.comSeptember 10, 2025

0 0 4 minutes read