Game Night

In Darpa Project, humans and AI bots engage in alliances and betrayals

The decades leading up to the start of World War I, known as The Great War (1914-1918), were fraught with tension as the ruling powers of Europe jockeyed for land and resources. This critical period is the backdrop for the board game “Diplomacy,” where players assume the roles of the competing powers and negotiate their way to victory by attempting to take over the region's supply centers.

Unlike many games, Diplomacy is not won or lost through random chance, but through forming alliances with other players and, inevitably, betraying said alliances to advance one’s own goals.

The complexity involved in these diplomatic interactions is why the board game was chosen as the sandbox for the Defense Advanced Research Projects Agency's (DARPA) Stabilizing Hostilities through Arbitration and Diplomatic Engagement (SHADE) project. DARPA is a part of the U.S. Department of Defense and is responsible for researching new technologies that could be used by the military and society.

Jordan Boyd-Graber is an associate professor at the University of Maryland and one of seven principal investigators of SHADE.

“In graduate school, I got interested in natural language processing,” Boyd-Graber said. “I realized that the language of Diplomacy is rich and is a useful research environment — or playground, if you will — to see how people talk to each other, betray each other, deceive each other, and persuade each other.”

"It’s important to develop effective countermeasures for detecting misinformation and deception online."
Jordan Boyd-Graber, University of Maryland

“The goal of SHADE was to create AI agents to engage in diplomatic activities,” Justin Drake, a research associate at TACC, explained. “Can you have AI engage with other AI, and also with humans in diplomatic processes like negotiation, coercion, and alliance formation?”

Seven teams from around the country went to work developing AI bots that could navigate the rules and participate in the complex communication the game requires. The 18-month program culminated in a Diplomacy tournament played by two human players and five AI bots.

TACC was approached to host the demanding computational needs of a project of this scale, but also to evaluate how well the research was working.

“For the teams, having both the computational power and experience provided by TACC was a real game changer,” said Niall Gaffney, director of Data and AI at TACC. “By letting everyone break things early and often, the teams produced convincing agents to negotiate and — for the first time — explain why they made the choices they did.”

Teaching a machine to communicate in such a nuanced, human way posed several challenges the teams had to overcome in the run up to the tournament.

“For the teams, having both the computational power and experience provided by TACC was a real game changer. By letting everyone break things early and often, the teams produced convincing agents to negotiate and — for the first time — explain why they made the choices they did.”
Niall Gaffney, Director of Data and AI, TACC

“The boring challenges are things like timing and synchronization,” Boyd-Graber said. “When humans communicate with each other there are norms. If I send you a message and don't get an answer, it's okay to send one or two additional messages, but not 20, and there are reasonable periods to wait to respond. Computers do none of this.”

The more difficult challenge was figuring out how human and computer players would communicate with each other, a field of study called natural language processing. Diplomacy has built-in language players use to document their moves and decisions, such as who they are forming an alliance with and where they are moving troops. However, the crucial part of the game — the diplomacy — takes place between those moves, where alliances are forged and broken through discussion among players.

This communication isn’t straightforward. Lies and deceptions are presented the same as open and honest communication with an ally, and it is up to each individual player to determine what is truth and what is deceit. Being able to read what an ally might do and understand when a betrayal might occur — or might have already occurred — is crucial to success at the game.

Making this communication work in a game played between human and AI players was the biggest challenge to overcome for SHADE researchers.

“You need to work together until you don't,” Boyd-Graber said. “You can’t make progress without cooperation, but you can’t win unless you betray the people you were formerly cooperating with. And so, one of the big challenges in the game is detecting when a friendship turns into something else.”

The concept of teaching a computer to deceive can evoke fear, but the applications for this research have a direct impact on misinformation and fake news already affecting our daily lives. Identifying sources of phishing, catfishing, or other forms of intentionally misleading communication is important in our digital world.

In addition, SHADE explored the need for large language models to explain their decisions. Building AI that can provide reasoning behind its output is critical to support human decision-making and in applications where an AI makes life-changing decisions: approving loans, diagnosing diseases, or in criminal justice.

“It’s important to develop effective countermeasures for detecting misinformation and deception online — this is a testbed where we can build our immune system to better counter that,” Boyd-Graber said. “If we’re not doing it, other people will. We should know about the capabilities of these models and how they could be used for deception, and make sure the public is informed.”