In a new review paper published in the journal Patterns, researchers argue that a range of current AI systems have learned how to deceive humans. They define deception as the systematic inducement of false beliefs in the pursuit of some outcome other than the truth.
“AI developers do not have a confident understanding of what causes undesirable AI behaviors like deception,” said MIT researcher Peter Park.
“But generally speaking, we think AI deception arises because a deception-based strategy turned out to be the best way to perform well at the given AI’s training task. Deception helps them achieve their goals.”
Dr. Park and colleagues analyzed literature focusing on ways in which AI systems spread false information — through learned deception, in which they systematically learn to manipulate others.
The most striking example of AI deception the researchers uncovered in their analysis was Meta’s CICERO, an AI system designed to play the game Diplomacy, which is a world-conquest game that involves building alliances.
Even though Meta claims it trained CICERO to be ‘largely honest and helpful’ and to ‘never intentionally backstab’ its human allies while playing the game, the data the company published revealed that CICERO didn’t play fair.
“We found that Meta’s AI had learned to be a master of deception,” Dr. Park said.
“While Meta succeeded in training its AI to win in the game of Diplomacy — CICERO placed in the top 10% of human players who had played more than one game — Meta failed to train its AI to win honestly.”
“Other AI systems demonstrated the ability to bluff in a game of Texas hold ‘em poker against professional human players, to fake attacks during the strategy game Starcraft II in order to defeat opponents, and to misrepresent their preferences in order to gain the upper hand in economic negotiations.”
“While it may seem harmless if AI systems cheat at games, it can lead to ‘breakthroughs in…
Read the full article here