Andrew Tarantola* says a Facebook team taught AI how to play a cooperative card game to gain a better understanding of how humans think.
When it comes to competitive games, AI systems have already shown they can easily mop the floor with the best humanity has to offer.
But life in the real world isn’t a zero-sum game like poker or Starcraft and we need AI to work with us, not against us.
That’s why a research team from Facebook taught an AI how to play the cooperative card game Hanabi, to gain a better understanding of how humans think.
Specifically, the Facebook team set out to instil upon its AI system the theory of mind.
“Theory of mind is this idea of understanding the beliefs and intentions of other agents or other players or humans,” Noam Brown, a researcher at Facebook AI, told Engadget.
“It’s something that humans developed from a very early age, but one AIs have struggled with for a very long time.”
“It’s trying to put itself in the shoes of the other players and ask why are they taking these actions,” Brown continued, “and being able to infer something about the state of the world that it can’t directly observe.”
So, what better way to teach an AI to play nice and empathise with other players than through a game that’s basically cooperative group solitaire?
Hanabi charges its two to five players to construct five, five-card stacks.
Each stack is colour coded and must be ordered numerically from one to five.
The goal is to complete all the stacks or get as close to 25 points (five points per stack/five stacks) as possible once the team has run out of moves.
The wrinkle to Hanabi is that none of the players knows what’s in their hands.
They have to hold their cards facing away from themselves so while they don’t know what they hold, their teammates do and vice versa.
Players can share information with their teammates by telling them either the colour or number of cards in their hands.
That information is limited to either “you have X number of blue cards” or “you have X number of 2 cards” while pointing to the specific cards.
Furthermore, sharing information comes at the cost of one “information token.”
The number of these tokens is limited, which prevents the team from using all of the tokens at the start of the game to fully inform themselves of what everybody is holding.
Instead, players have to infer what they’re holding based on what their teammates are telling them and why they think their teammates are telling them at that point of the game.
It forces players to get into the headspace of their teammates and try to figure out the reasoning behind their actions.
To date, the AI systems that have bested human players in Go and DOTA2 have relied on reinforcement learning techniques to teach themselves how to play the game.
Facebook’s team improved upon this system by incorporating a new real-time search function.
“Our search technique can be used to significantly improve any Hanabi strategy, including deep reinforcement learning (RL) algorithms that set the previous state of the art,” Facebook’s Hengyuan Hu and Jakob Foerster wrote in a blog post.
The strategy is known as the blueprint policy.
It’s the generally accepted strategy and conventions that all the players agree to ahead of time.
In Hanabi, those conventions are basically “don’t lie to the others about what they’re holding” and “don’t intentionally tank the game.”
“The way humans play is they start with a rough strategy,” Facebook AI researcher Adam Lerer told Engadget.
“And then they search locally based on the situation they’re in, to find optimal sets of moves assuming that the other players are going to be playing this blueprint.”
Facebook’s Hanabi AI does the same thing.
Its search technique first establishes a rough “blueprint” of what could happen as the game unfolds and then uses that information to generate a near-optimal strategy in real time based on what cards are currently in play.
What’s more, this system can designate either a single player as the “searcher” or multiple players.
A searcher in this case is one player who is capable of interpreting the moves of their teammates, all of whom are assumed to operate under the blueprint policy.
In a “single-agent search,” the searcher maintains a probability distribution as to what cards it thinks it’s holding and then updates that distribution, “based on whether the other agent would have taken the observed action according to the blueprint strategy if the searcher were holding that hand,” according to the blog post.
“Multi-agent” search essentially enables each player to replicate the search the previous player ran to see what strategies their searchers came up with.
While single-agent search provides enough of a predictive boost to put AI players ahead of even elite human Hanabi players, multi-agent search results in near-perfect 25-point scores.
“We’ve also found that single-agent search greatly boosts performance in Hanabi with more than two players as well for every blueprint we tested,” the blog post noted, “though conducting multi-agent search would be much more expensive with more players since each player observes more cards.”
Getting near perfect scores on an obscure French card game is great and all but Facebook has bigger plans for its cooperative AI.
“What we’re looking at is artificial agents that can reason better about cooperative interactions with humans and chatbots that can reason about why the person they’re chatting with said the thing they did,” Lerer explained.
The team also points towards potential autonomous automotive applications.
For example, self-driving vehicles that infer from the cars slowing and stopping ahead of them that they’re doing so because a pedestrian is crossing the road, without having to see the person in the crosswalk themselves first-hand.
More immediately, however, the team hopes to further expand its research, this time into mixed cooperative-competitive games like bridge.
* Andrew Tarantola writes for Engadget. He tweets at @Terrortola.
This article first appeared at www.engadget.com