{"id":766,"date":"2023-10-28T04:20:05","date_gmt":"2023-10-28T04:20:05","guid":{"rendered":"https:\/\/todaysainews.com\/index.php\/2023\/10\/28\/mastering-stratego-the-classic-game-of-imperfect-information-2\/"},"modified":"2025-04-27T07:32:35","modified_gmt":"2025-04-27T07:32:35","slug":"mastering-stratego-the-classic-game-of-imperfect-information-2","status":"publish","type":"post","link":"https:\/\/todaysainews.com\/index.php\/2023\/10\/28\/mastering-stratego-the-classic-game-of-imperfect-information-2\/","title":{"rendered":"Mastering Stratego, the classic game of imperfect information"},"content":{"rendered":"<p> [ad_1]<br \/>\n<\/p>\n<div>\n<div class=\"article-cover article-cover--centered\">\n<div class=\"article-cover__header\">\n<p class=\"article-cover__eyebrow glue-label\">Research<\/p>\n<dl class=\"article-cover__meta\">\n<dt class=\"glue-visually-hidden\">Published<\/dt>\n<dd class=\"article-cover__date glue-label\">\n              <time datetime=\"2022-12-01\"><br \/>\n                1 December 2022<br \/>\n              <\/time>\n            <\/dd>\n<dt class=\"glue-visually-hidden\">Authors<\/dt>\n<dd class=\"article-cover__authors\">\n<p data-block-key=\"fksph\">Julien Perolat, Bart De Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub and Karl Tuyls<\/p>\n<\/dd>\n<\/dl>\n<section class=\"glue-social glue-social--zippy share share--centered article-cover__share\" data-glue-expansion-panel-expand-tooltip=\"Share: Expand to see social channels\" data-glue-expansion-panel-collapse-tooltip=\"Share: Hide social channels\" id=\"share-7382899f-4f7c-4c48-88da-61f5d5c727b2\">\n<\/section><\/div>\n<picture class=\"article-cover__image\"><source media=\"(min-width: 1024px)\" type=\"image\/webp\" width=\"1072\" height=\"603\" srcset=\"https:\/\/lh3.googleusercontent.com\/nvWTaah_1s2OEAt4CsxX5gKok_0V6-Q5eH3aW3GF6YyZdEVM0OBdgFxNa4DAbmUCXpvTqTfslfUB7_3ZBYr6kIQuk2u46khXH41IU16EZghstwt72Mk=w1072-h603-n-nu-rw 1x, https:\/\/lh3.googleusercontent.com\/nvWTaah_1s2OEAt4CsxX5gKok_0V6-Q5eH3aW3GF6YyZdEVM0OBdgFxNa4DAbmUCXpvTqTfslfUB7_3ZBYr6kIQuk2u46khXH41IU16EZghstwt72Mk=w2144-h1206-n-nu-rw 2x\"\/><source media=\"(min-width: 600px)\" type=\"image\/webp\" width=\"928\" height=\"522\" srcset=\"https:\/\/lh3.googleusercontent.com\/nvWTaah_1s2OEAt4CsxX5gKok_0V6-Q5eH3aW3GF6YyZdEVM0OBdgFxNa4DAbmUCXpvTqTfslfUB7_3ZBYr6kIQuk2u46khXH41IU16EZghstwt72Mk=w928-h522-n-nu-rw 1x, https:\/\/lh3.googleusercontent.com\/nvWTaah_1s2OEAt4CsxX5gKok_0V6-Q5eH3aW3GF6YyZdEVM0OBdgFxNa4DAbmUCXpvTqTfslfUB7_3ZBYr6kIQuk2u46khXH41IU16EZghstwt72Mk=w1856-h1044-n-nu-rw 2x\"\/><source type=\"image\/webp\" width=\"528\" height=\"297\" srcset=\"https:\/\/lh3.googleusercontent.com\/nvWTaah_1s2OEAt4CsxX5gKok_0V6-Q5eH3aW3GF6YyZdEVM0OBdgFxNa4DAbmUCXpvTqTfslfUB7_3ZBYr6kIQuk2u46khXH41IU16EZghstwt72Mk=w528-h297-n-nu-rw 1x, https:\/\/lh3.googleusercontent.com\/nvWTaah_1s2OEAt4CsxX5gKok_0V6-Q5eH3aW3GF6YyZdEVM0OBdgFxNa4DAbmUCXpvTqTfslfUB7_3ZBYr6kIQuk2u46khXH41IU16EZghstwt72Mk=w1056-h594-n-nu-rw 2x\"\/><img loading=\"lazy\" decoding=\"async\" alt=\"\" height=\"603\" src=\"https:\/\/lh3.googleusercontent.com\/nvWTaah_1s2OEAt4CsxX5gKok_0V6-Q5eH3aW3GF6YyZdEVM0OBdgFxNa4DAbmUCXpvTqTfslfUB7_3ZBYr6kIQuk2u46khXH41IU16EZghstwt72Mk=w1072-h603-n-nu\" width=\"1072\"\/>\n    <\/picture>\n<\/p><\/div>\n<div class=\"gdm-rich-text rich-text\">\n<p data-block-key=\"ixpio\"><b>DeepNash learns to play Stratego from scratch by combining game theory and model-free deep RL<\/b><\/p>\n<p data-block-key=\"mwwm3\">Game-playing artificial intelligence (AI) systems have advanced to a new frontier. Stratego, the classic board game that\u2019s more complex than chess and Go, and craftier than poker, has now been mastered. <a href=\"https:\/\/www.science.org\/stoken\/author-tokens\/ST-887\/full\" rel=\"noopener\" target=\"_blank\">Published in Science<\/a>, we present <i>DeepNash<\/i>, an AI agent that learned the game from scratch to a human expert level by playing against itself.<\/p>\n<p data-block-key=\"vtkzc\">DeepNash uses a novel approach, based on game theory and model-free deep reinforcement learning. Its play style converges to a Nash equilibrium, which means its play is very hard for an opponent to exploit. So hard, in fact, that DeepNash has reached an all-time top-three ranking among human experts on the world\u2019s biggest online Stratego platform, Gravon.<\/p>\n<p data-block-key=\"lswc4\">Board games have historically been a measure of progress in the field of AI, allowing us to study how humans and machines develop and execute strategies in a controlled environment. Unlike chess and Go, Stratego is a game of imperfect information: players cannot directly observe the identities of their opponent&#8217;s pieces.<\/p>\n<p data-block-key=\"v5wcq\">This complexity has meant that other AI-based Stratego systems have struggled to get beyond amateur level. It also means that a very successful AI technique called \u201cgame tree search\u201d, previously used to master many games of perfect information, is not sufficiently scalable for Stratego. For this reason, DeepNash goes far beyond game tree search altogether.<\/p>\n<p data-block-key=\"xunvl\">The value of mastering Stratego goes beyond gaming. In pursuit of our mission of solving intelligence to advance science and benefit humanity, we need to build advanced AI systems that can operate in complex, real-world situations with limited information of other agents and people. Our paper shows how DeepNash can be applied in situations of uncertainty and successfully balance outcomes to help solve complex problems.<\/p>\n<h2 data-block-key=\"z61bm\">Getting to know Stratego<\/h2>\n<p data-block-key=\"n3q5o\">Stratego is a turn-based, capture-the-flag game. It\u2019s a game of bluff and tactics, of information gathering and subtle manoeuvring. And it\u2019s a zero-sum game, so any gain by one player represents a loss of the same magnitude for their opponent.<\/p>\n<p data-block-key=\"2w4q7\">Stratego is challenging for AI, in part, because it\u2019s a game of imperfect information. Both players start by arranging their 40 playing pieces in whatever starting formation they like, initially hidden from one another as the game begins. Since both players don&#8217;t have access to the same knowledge, they need to balance all possible outcomes when making a decision \u2013 providing a challenging benchmark for studying strategic interactions. The types of pieces and their rankings are shown below.<\/p>\n<\/div>\n<figure class=\"single-media single-media--large\"><figcaption class=\"single-media__caption\">\n<p data-block-key=\"zgndx\"><b>Left:<\/b> The piece rankings. In battles, higher-ranking pieces win, except the 10 (Marshal) loses when attacked by a Spy, and Bombs always win except when captured by a Miner.<br \/><b>Middle:<\/b> A possible starting formation. Notice how the Flag is tucked away safely at the back, flanked by protective Bombs. The two pale blue areas are \u201clakes\u201d and are never entered.<br \/><b>Right:<\/b> A game in play, showing Blue\u2019s Spy capturing Red\u2019s 10.<\/p>\n<\/figcaption><\/figure>\n<div class=\"gdm-rich-text rich-text\">\n<p data-block-key=\"uo7pc\">Information is hard won in Stratego. The identity of an opponent&#8217;s piece is typically revealed only when it meets the other player on the battlefield. This is in stark contrast to games of perfect information such as chess or Go, in which the location and identity of every piece is known to both players.<\/p>\n<p data-block-key=\"1ylus\">The machine learning approaches that work so well on perfect information games, such as DeepMind\u2019s <a href=\"https:\/\/deepmind.google\/discover\/blog\/alphazero-shedding-new-light-on-chess-shogi-and-go\/\">AlphaZero<\/a>, are not easily transferred to Stratego. The need to make decisions with imperfect information, and the potential to bluff, makes Stratego more akin to Texas hold\u2019em poker and requires a human-like capacity once noted by the American writer Jack London: \u201cLife is not always a matter of holding good cards, but sometimes, playing a poor hand well.\u201d<\/p>\n<p data-block-key=\"9wppr\">The AI techniques that work so well in games like Texas hold\u2019em don\u2019t transfer to Stratego, however, because of the sheer length of the game \u2013 often hundreds of moves before a player wins. Reasoning in Stratego must be done over a large number of sequential actions with no obvious insight into how each action contributes to the final outcome.<\/p>\n<p data-block-key=\"8y97f\">Finally, the number of possible game states (expressed as \u201cgame tree complexity\u201d) is off the chart compared with chess, Go and poker, making it incredibly difficult to solve. This is what excited us about Stratego, and why it has represented a decades-long challenge to the AI community.<\/p>\n<\/div>\n<figure class=\"single-media single-media--inline\"><figcaption class=\"single-media__caption\">\n<p data-block-key=\"hzzwt\">The scale of the differences between chess, poker, Go, and Stratego.<\/p>\n<\/figcaption><\/figure>\n<div class=\"gdm-rich-text rich-text\">\n<h2 data-block-key=\"4g6mx\">Seeking an equilibrium<\/h2>\n<p data-block-key=\"2ys75\">DeepNash employs a novel approach based on a combination of game theory and model-free deep reinforcement learning. \u201cModel-free\u201d means DeepNash is not attempting to explicitly model its opponent\u2019s private game-state during the game. In the early stages of the game in particular, when DeepNash knows little about its opponent\u2019s pieces, such modelling would be ineffective, if not impossible.<\/p>\n<p data-block-key=\"ep8e8\">And because the game tree complexity of Stratego is so vast, DeepNash cannot employ a stalwart approach of AI-based gaming \u2013 Monte Carlo tree search. Tree search has been a key ingredient of many landmark achievements in AI for less complex board games, and poker.<\/p>\n<p data-block-key=\"bo8wa\">Instead, DeepNash is powered by a new game-theoretic algorithmic idea that we&#8217;re calling Regularised Nash Dynamics (R-NaD). Working at an unparalleled scale, R-NaD steers DeepNash\u2019s learning behaviour towards what\u2019s known as a Nash equilibrium (dive into the technical details in <a href=\"https:\/\/www.science.org\/stoken\/author-tokens\/ST-887\/full\" rel=\"noopener\" target=\"_blank\">our paper).<\/a><\/p>\n<p data-block-key=\"4arrq\">Game-playing behaviour that results in a Nash equilibrium is unexploitable over time. If a person or machine played perfectly unexploitable Stratego, the worst win rate they could achieve would be 50%, and only if facing a similarly perfect opponent.<\/p>\n<p data-block-key=\"76oaa\">In matches against the best Stratego bots \u2013 including several winners of the Computer Stratego World Championship \u2013 DeepNash\u2019s win rate topped 97%, and was frequently 100%. Against the top expert human players on the Gravon games platform, DeepNash achieved a win rate of 84%, earning it an all-time top-three ranking.<\/p>\n<h2 data-block-key=\"3g6gk\">Expect the unexpected<\/h2>\n<p data-block-key=\"ubir1\">To achieve these results, DeepNash demonstrated some remarkable behaviours both during its initial piece-deployment phase and in the gameplay phase. To become hard to exploit, DeepNash developed an unpredictable strategy. This means creating initial deployments varied enough to prevent its opponent spotting patterns over a series of games. And during the game phase, DeepNash randomises between seemingly equivalent actions to prevent exploitable tendencies.<\/p>\n<p data-block-key=\"6e84m\">Stratego players strive to be unpredictable, so there\u2019s value in keeping information hidden. DeepNash demonstrates how it values information in quite striking ways. In the example below, against a human player, DeepNash (blue) sacrificed, among other pieces, a 7 (Major) and an 8 (Colonel) early in the game and as a result was able to locate the opponent\u2019s 10 (Marshal), 9 (General), an 8 and two 7\u2019s.<\/p>\n<\/div>\n<figure class=\"single-media single-media--inline\"><figcaption class=\"single-media__caption\">\n<p data-block-key=\"vh32v\">In this early game situation, DeepNash (blue) has already located many of its opponent\u2019s most powerful pieces, while keeping its own key pieces secret.<\/p>\n<\/figcaption><\/figure>\n<div class=\"gdm-rich-text rich-text\">\n<p data-block-key=\"afzt2\">These efforts left DeepNash at a significant material disadvantage; it lost a 7 and an 8 while its human opponent preserved all their pieces ranked 7 and above. Nevertheless, having solid intel on its opponent\u2019s top brass, DeepNash evaluated its winning chances at 70% \u2013 and it won.<\/p>\n<h2 data-block-key=\"wnj5q\">The art of the bluff<\/h2>\n<p data-block-key=\"glzbi\">As in poker, a good Stratego player must sometimes represent strength, even when weak. DeepNash learned a variety of such bluffing tactics. In the example below, DeepNash uses a 2 (a weak Scout, unknown to its opponent) as if it were a high-ranking piece, pursuing its opponent\u2019s known 8. The human opponent decides the pursuer is most likely a 10, and so attempts to lure it into an ambush by their Spy. This tactic by DeepNash, risking only a minor piece, succeeds in flushing out and eliminating its opponent\u2019s Spy, a critical piece.<\/p>\n<\/div>\n<figure class=\"single-media single-media--inline\"><figcaption class=\"single-media__caption\">\n<p data-block-key=\"f849c\">The human player (red) is convinced the unknown piece chasing their 8 must be DeepNash\u2019s 10 (note: DeepNash had already lost its only 9).<\/p>\n<\/figcaption><\/figure>\n<div class=\"gdm-rich-text rich-text\">\n<p data-block-key=\"57qoj\">See more by watching these four videos of full-length games played by DeepNash against (anonymised) human experts: <a href=\"https:\/\/youtu.be\/HaUdWoSMjSY\" rel=\"noopener\" target=\"_blank\">Game 1<\/a>, <a href=\"https:\/\/youtu.be\/L-9ZXmyNKgs\" rel=\"noopener\" target=\"_blank\">Game 2<\/a>, <a href=\"https:\/\/youtu.be\/EOalLpAfDSs\" rel=\"noopener\" target=\"_blank\">Game 3<\/a>, <a href=\"https:\/\/youtu.be\/MhNoYl_g8mo\" rel=\"noopener\" target=\"_blank\">Game 4.<\/a><\/p>\n<\/div>\n<figure class=\"quote quote--inline\">\n<blockquote class=\"quote__text\">\n<p data-block-key=\"zljms\">The level of play of DeepNash surprised me. I had never heard of an artificial Stratego player that came close to the level needed to win a match against an experienced human player. But after playing against DeepNash myself, I wasn\u2019t surprised by the top-3 ranking it later achieved on the Gravon platform. I expect it would do very well if allowed to participate in the human World Championships.<\/p>\n<\/blockquote><figcaption class=\"quote__author\">\n<p data-block-key=\"ikujl\">Vincent de Boer, paper co-author and former Stratego World Champion<\/p>\n<\/figcaption><\/figure>\n<div class=\"gdm-rich-text rich-text\">\n<h2 data-block-key=\"kf4nl\">Future directions<\/h2>\n<p data-block-key=\"0mtif\">While we developed DeepNash for the highly defined world of Stratego, our novel R-NaD method can be directly applied to other two-player zero-sum games of both perfect or imperfect information. R-NaD has the potential to generalise far beyond two-player gaming settings to address large-scale real-world problems, which are often characterised by imperfect information and astronomical state spaces.<\/p>\n<p data-block-key=\"60i9t\">We also hope R-NaD can help unlock new applications of AI in domains that feature a large number of human or AI participants with different goals that might not have information about the intention of others or what\u2019s occurring in their environment, such as in the large-scale optimisation of traffic management to reduce driver journey times and the associated vehicle emissions.<\/p>\n<p data-block-key=\"nv7fi\">In creating a generalisable AI system that\u2019s robust in the face of uncertainty, we hope to bring the problem-solving capabilities of AI further into our inherently unpredictable world.<\/p>\n<p data-block-key=\"jfe6h\">Learn more about DeepNash by reading <a href=\"https:\/\/www.science.org\/stoken\/author-tokens\/ST-887\/full\" rel=\"noopener\" target=\"_blank\">our paper in Science<\/a>.<\/p>\n<p data-block-key=\"dh6fu\">For researchers interested in giving R-NaD a try or working with our newly proposed method, we\u2019ve open-sourced <a href=\"https:\/\/github.com\/deepmind\/open_spiel\/tree\/master\/open_spiel\/python\/algorithms\/rnad\" rel=\"noopener\" target=\"_blank\">our code<\/a>.<\/p>\n<\/div>\n<aside class=\"notes\">\n<div class=\"glue-page\">\n<div class=\"gdm-rich-text notes__inner\">\n<h2 data-block-key=\"gsghs\">Paper authors<\/h2>\n<p data-block-key=\"zixet\">Julien Perolat, Bart De Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T Connor, Neil Burch, Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent Sifre, Nathalie Beauguerlange, Remi Munos, David Silver, Satinder Singh, Demis Hassabis, Karl Tuyls.<\/p>\n<\/p><\/div>\n<\/p><\/div>\n<\/aside>\n<aside class=\"related-posts\">\n<\/aside><\/div>\n<p>[ad_2]<br \/>\n<br \/><a href=\"https:\/\/deepmind.google\/discover\/blog\/mastering-stratego-the-classic-game-of-imperfect-information\/\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[ad_1] Research Published 1 December 2022 Authors Julien Perolat, Bart De Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub<\/p>\n","protected":false},"author":2,"featured_media":767,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21],"tags":[],"class_list":["post-766","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-deepmind-ai"],"_links":{"self":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts\/766","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/comments?post=766"}],"version-history":[{"count":1,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts\/766\/revisions"}],"predecessor-version":[{"id":2691,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/posts\/766\/revisions\/2691"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/media\/767"}],"wp:attachment":[{"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/media?parent=766"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/categories?post=766"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/todaysainews.com\/index.php\/wp-json\/wp\/v2\/tags?post=766"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}