Large Language Models and Other AI Systems are Already Capable of Deceiving Humans, Scientists Say

In a new review paper published in the journal Patterns, researchers argue that a range of current AI systems have learned how to deceive humans. They define deception as the systematic inducement of false beliefs in the pursuit of some outcome other than the truth.

“AI developers do not have a confident understanding of what causes undesirable AI behaviors like deception,” said MIT researcher Peter Park.

“But generally speaking, we think AI deception arises because a deception-based strategy turned out to be the best way to perform well at the given AI’s training task. Deception helps them achieve their goals.”

Dr. Park and colleagues analyzed literature focusing on ways in which AI systems spread false information — through learned deception, in which they systematically learn to manipulate others.

The most striking example of AI deception the researchers uncovered in their analysis was Meta’s CICERO, an AI system designed to play the game Diplomacy, which is a world-conquest game that involves building alliances.

Even though Meta claims it trained CICERO to be ‘largely honest and helpful’ and to ‘never intentionally backstab’ its human allies while playing the game, the data the company published revealed that CICERO didn’t play fair.

“We found that Meta’s AI had learned to be a master of deception,” Dr. Park said.

“While Meta succeeded in training its AI to win in the game of Diplomacy — CICERO placed in the top 10% of human players who had played more than one game — Meta failed to train its AI to win honestly.”

“Other AI systems demonstrated the ability to bluff in a game of Texas hold ‘em poker against professional human players, to fake attacks during the strategy game Starcraft II in order to defeat opponents, and to misrepresent their preferences in order to gain the upper hand in economic negotiations.”

“While it may seem harmless if AI systems cheat at games, it can lead to ‘breakthroughs in…

Read the full article here

Want to advertise or share your work with Science News Watch? Contact us.

Large Language Models and Other AI Systems are Already Capable of Deceiving Humans, Scientists Say

Sci.news

Related Articles

Ceres is Former Ocean World, Planetary Scientists Say

‘Night Magic’ invites you to celebrate the living wonders of the dark

Quantum Entanglement in Quarks Observed for the First Time

What is a muscle knot actually? A pain in the neck, but not a knot.

40-Million-Year-Old Baltic Amber Provides First Fossil Record of Predatory Fungus Gnats

545-Million-Year-Old Crater Could Reshape Our Understanding of Earth’s Geological History

Get exclusive updates

Welcome Back!

Retrieve your password

Large Language Models and Other AI Systems are Already Capable of Deceiving Humans, Scientists Say

Sci.news

Related Articles

Ceres is Former Ocean World, Planetary Scientists Say

‘Night Magic’ invites you to celebrate the living wonders of the dark

Quantum Entanglement in Quarks Observed for the First Time

What is a muscle knot actually? A pain in the neck, but not a knot.

40-Million-Year-Old Baltic Amber Provides First Fossil Record of Predatory Fungus Gnats

545-Million-Year-Old Crater Could Reshape Our Understanding of Earth’s Geological History

Topics

Get exclusive updates

Welcome Back!

Retrieve your password