Commentary: Heads-up limit hold'em poker is solved

 Commentary: Heads up limit hold'em poker solved


Poker's deceitful and clever tactics have captured the imagination many times. James Bond beats a terrorist financier at a poker table in Casino Royale. Bond's poker ability reflects his intelligence as a spy. He can spot lies and deceit and think one step ahead of his opponent. The rise of computers has had an impact on poker, as with other areas of human skill. A supercomputer that had 48 CPUs ran for 68 consecutive days "solved" heads up limit hold'em poker. It was the simplest form of poker played online and in casinos. This computer can't be beat, even after many years of play. This commentary discusses Bowling et al.'s winning strategy. The article's goal is to ask: Does the computer strategy in the game’s key first decision reflect poker expert wisdom or does it play completely differently?


The common test of the relative skill levels of computer and expert experts is games. Deep Blue lost to Garry Kasparov in 1997, and Lee Sedol lost in 2016 to AlphaGo. In 2008, a similar expert/computer match was held for heads up limit hold'em (CPRG, 2008). A seven-member team played "Polaris", an artificial computer developed by researchers at the University of Alberta. This computer was later used to build the 2015 supercomputer. Polaris emerged as the overall winner. However, Matt Hawrilenko was considered the best in the game of poker (Brodie 2008, Arnett 2009, Nalbone 2011).


Even a fairly simple card game between two players and a 52-card deck can become complex. In this poker game, there are 316 x 1017 game states (Bowling, et al. 2015). Complex problems must be simplified in order to improve learning and performance (Dreyfus und Dreyfus 1987). Even for simpler problems, we use simple "heuristics" (Gigerenzer, Hertwig, et. al. 1999; Hertwig, et. al. 2013, 2013). The two most important principles of poker theorists are aggression and information concealing (Chen & Ankenman 2006). It's better to be aggressive and raise the stakes than to equalize the stakes with calling. It is usually better to hide information than have a unique strategy.


These principles are important, and this strategy can be used by the first player in the first round. A player can play any hand they wish by folding (no more money is put in, immediately forfeiting the hand), calling and raising (doubling or equaling the bet). This strategy is about finding one threshold point. Hands that are weaker than this will be folded. The stronger hands can be played by raising (raise or fold). Calling, although a possible strategy, is not considered on the first round decision. However, calling may be considered later. The first round strategy for the first player must also include play in a second scenario. If the second person re-raises the hand, the first player is reintroduced with the fold. Call, raise trilemma. (If the second person folds, play ends immediately. However, if the second caller calls, play continues to the second round. Folding is therefore not recommended from a risk/reward perspective (Sklansky, 1999). Similar arguments would mean raising little, since a skilled player will never fold. In this situation, aggression is better than information hiding and always-calling. The first player should raise-orfold based on his initial hand strength and then always call. The first round strategy is boiled down to one hand: The worst hand worthy of raising. A simple strategy is all that's needed to make an effective strategy.


Matt Hawrilenko used this strategy during the 2008 match (Newall 2011). He raised over 1000 hands and 86.8% otherwise he folded. He always called when he needed to re-raise funds. Computers are not subject to the same computational limitations as humans. It's not surprising, then, that Polaris 2008, the computer, used a similar, but more complicated strategy. Polaris raised 85.0%, which is 0.8% less than Hawrilenko's. However, it called 2.4%. Polaris called 83.6% in a situation where it was facing a raise, compared to Hawrilenko calling 100.


Polaris in 2008 was significantly less skilled at poker than "Cepheus" and required fewer computational resource (Bowling, et al. (2015)). How does Cepheus measure up? Surprisingly the more complex computer agent uses a simpler strategy. The three strategies' observed behavior is shown in Table 1. This table combines data from Newall and Bowling et.al., 2015. Cepheus raises initially 82.54% and calls a tiny 0.06%. Cepheus calls more often than Polaris when he is facing a re-raise, closer to Hawrilenko. Cepheus calls 99.1% if he's facing a raise, which is again closer to Polaris and Hawrilenko. You would be a terrible person to duplicate these rare plays. According to one co-author of the study, Cepheus’s deviations from Hawrilenko’s simple strategy are "most likely part and parcel of the noise that makes it ‘essentially’ solved and not just solvable" (Burch (2015). the psychology of online poker mind games at texas holdem

Komentar

Postingan populer dari blog ini

Artikel ini muncul di majalah The American Prospect edisi April 2022.

WINNING PICK-5 & PICK-6 LOTTO STATEGY

Panduan untuk Strategi Tery