Poker, also known as Texas Hold'em, has always been a typical research target
within imperfect information games (IIGs). IIGs have long served as a measure
of artificial intelligence (AI) development. Representative prior works, such
as DeepStack and Libratus heavily rely on counterfactual regret minimization
(CFR) to tackle heads-up no-limit Poker. However, it is challenging for
subsequent researchers to learn CFR from previous models and apply it to other
real-world applications due to the expensive computational cost of CFR
iterations. Additionally, CFR is difficult to apply to multi-player games due
to the exponential growth of the game tree size. In this work, we introduce
PokerGPT, an end-to-end solver for playing Texas Hold'em with arbitrary number
of players and gaining high win rates, established on a lightweight large
language model (LLM). PokerGPT only requires simple textual information of
Poker games for generating decision-making advice, thus guaranteeing the
convenient interaction between AI and humans. We mainly transform a set of
textual records acquired from real games into prompts, and use them to
fine-tune a lightweight pre-trained LLM using reinforcement learning human
feedback technique. To improve fine-tuning performance, we conduct prompt
engineering on raw data, including filtering useful information, selecting
behaviors of players with high win rates, and further processing them into
textual instruction using multiple prompt engineering techniques. Through the
experiments, we demonstrate that PokerGPT outperforms previous approaches in
terms of win rate, model size, training time, and response speed, indicating
the great potential of LLMs in solving IIGs.

PokerGPT 使用增强学习人类反馈技术，通过将真实游戏记录转化为提示信息，将 LM 模型细化，从而解决了德州扑克等不完全信息游戏中的问题，在获胜率、模型大小、训练时间和响应速度等方面优于之前的方法。

PokerGPT: 通过大型语言模型的端到端轻量级解决方案，用于多人德州扑克

PokerGPT: An End-to-End Lightweight Solver for Multi-Player Texas  Hold'em via Large Language Model

An imperfect-information game is a type of game with asymmetric information.
It is more common in life than perfect-information game. Artificial
intelligence (AI) in imperfect-information games, such like poker, has made
considerable progress and success in recent years. The great success of
superhuman poker AI, such as Libratus and Deepstack, attracts researchers to
pay attention to poker research. However, the lack of open-source code limits
the development of Texas hold'em AI to some extent. This article introduces
DecisionHoldem, a high-level AI for heads-up no-limit Texas hold'em with safe
depth-limited subgame solving by considering possible ranges of opponent's
private hands to reduce the exploitability of the strategy. Experimental
results show that DecisionHoldem defeats the strongest openly available agent
in heads-up no-limit Texas hold'em poker, namely Slumbot, and a high-level
reproduction of Deepstack, viz, Openstack, by more than 730 mbb/h
(one-thousandth big blind per round) and 700 mbb/h. Moreover, we release the
source codes and tools of DecisionHoldem to promote AI development in
imperfect-information games.

本文介绍了 DecisionHoldem，一种高级 AI，可通过安全深度限制子游戏解决来降低对手的可能手牌范围，以减少策略的可利用性，实验结果表明，DecisionHoldem 战胜了 heads-up no-limit Texas hold'em 扑克中最强的公开可用代理 Slumbot 和 Deepstack 的高水平繁殖，即 OpenStack，超过了 730 mbb/h 和 700mbb/h。此外，我们公开了 DecisionHoldem 的源代码和工具，以促进不完全信息游戏中的人工智能发展。

DecisionHoldem: 不完全信息游戏的安全深度限制求解与多样对手

DecisionHoldem: Safe Depth-Limited Solving With Diverse Opponents for Imperfect-Information Games

Poker is a challenging problem for artificial intelligence, with
non-deterministic dynamics, partial observability, and the added difficulty of
unknown adversaries. Modelling all of the uncertainties in this domain is not
an easy task. In this paper we present a Bayesian probabilistic model for a
broad class of poker games, separating the uncertainty in the game dynamics
from the uncertainty of the opponent's strategy. We then describe approaches to
two key subproblems: (i) inferring a posterior over opponent strategies given a
prior distribution and observations of their play, and (ii) playing an
appropriate response to that distribution. We demonstrate the overall approach
on a reduced version of poker using Dirichlet priors and then on the full game
of Texas hold'em using a more informed prior. We demonstrate methods for
playing effective responses to the opponent, based on the posterior.

本论文提出了一种基于贝叶斯概率模型的智能扑克方法，通过分离游戏动力学和对手策略的不确定性，采用狄利克雷先验测试了对手的策略概率，对于对手的后验分布提出有效的对策，该方法在德克萨斯 Hold'em 中得到了应用。