Here’s how a new AI mastered the tricky game of Stratego

Here’s how a new AI mastered the tricky game of Stratego

DeepNash, a new AI, has mastered Stratego one the most iconic boardgames in which computers don’t often trounce human players according a paper published this past week . It’s a surprising and huge result, at least for the Stratego community.

Stratego presents two distinct challenges. It requires long-term strategic thinking (like in chess) as well as players to deal with incomplete data (like in poker). The goal of the game is to move across a board and capture the flag piece of the opposing player. Each game takes place over a 10 x 10 gridded board with two 2 x 2 square lakes blocking the middle of the board. Both players have 40 pieces with different tactical values that can are deployed at the start of the game–the catch is that you can’t see what your opponent’s pieces are and they can’t see what yours are. You don’t know if the defender will be a Marshal who can beat all your pieces or a Sergeant who can be taken out by a Lieutenant, Captain, or a lower-ranking Sergeant. Other playable pieces include bombs, which are powerful but immobile, scouts that can move more than one area at a time, and miners who can defuse bombs. These all add to the tactical complexity. The game ends when one player’s flag is captured.

This is to say that Stratego presents a unique challenge to computers to solve. Chess is relatively simple because all information is visible to everyone. In game theory, it’s known as a “perfect information” game. A computer can look at your defences, simulate 10 or so moves ahead for a few different options, and pick the best one. This gives them a significant strategic advantage over human players. It helps that chess is a game that can be won or lost in a few key moments, rather than under pressure. The average chess game takes around 40 moves while Stratego takes more than 380. This means that each move in chess is much more important (and warrants a lot of consideration for humans), while Stratego is faster paced and more flexible.

[Related: Meta’s new AI can use deceit to conquer a board game world]

Stratego is, however, an “imperfectly information game.” You cannot know what an opponent’s piece is until it attacks or is attacked. In poker, an imperfect information game that computers have been able to play at a high level for years, there are 10^164 possible game states and each player only has 10^3 possible two-card starting hands. In Stratego, there are 10^535 possible states and more than 10^66 possible deployments–that means there’s a lot more unknown information to account for. This is just the beginning of the strategic challenges.

Combining these two challenges makes Stratego particularly difficult for computers (or AI scientists). The team stated that it is not possible to use state of the-art model-based perfect planning techniques or state-of the-art imperfect information searching techniques that break down the game into separate situations.

But DeepNash was able to pull this off. Researchers devised a new method that allowed DeepNash to learn Stratego by itself and create its own strategies. It used a model-reinforcement learning algorithm called Regularized Nash Dynamics (R-NaD) combined with a deep neural network architecture that seeks a Nash equilibrium–“an unexploitable strategy in zero-sum two-player games” like Stratego–and by doing so, it could learn the “qualitative behavior that one could expect a top player to master.” This is an approach that has been used before in simple Prisoners Dilemma-style games, but never with a game as complex as this.

DeepNash has been tested against the best Stratego bots and human players. It beat all other bots and was highly competitive against the expert humans on Gravon, an online board games platform. It was also able to play well qualitatively. It could make tradeoffs between hiding the identity of its pieces and taking material, execute bluffs and even gamble calculatedly. DeepNash may not be able to have mental states like deception or bluff, but the researchers do consider these terms. )

All in all, it’s a great demonstration of a new method of training AI models to play (and possibly other similar tasks in future). It doesn’t rely upon computationally heavy deep-search strategies that were previously used to play other games such as chess, Go and poker.

Read More