We study an online learning problem in general-sum Stackelberg games, where
players act in a decentralized and strategic manner. We study two settings
depending on the type of information for the follower: (1) the limited
information setting where the follower only observes its own reward, and (2)
the side information setting where the follower has extra side information
about the leader's reward. We show that for the follower, myopically best
responding to the leader's action is the best strategy for the limited
information setting, but not necessarily so for the side information setting --
the follower can manipulate the leader's reward signals with strategic actions,
and hence induce the leader's strategy to converge to an equilibrium that is
better off for itself. Based on these insights, we study decentralized online
learning for both players in the two settings. Our main contribution is to
derive last-iterate convergence and sample complexity results in both settings.
Notably, we design a new manipulation strategy for the follower in the latter
setting, and show that it has an intrinsic advantage against the best response
strategy. Our theories are also supported by empirical results.

我们研究了分布式和策略性在线学习问题，通过对不完全信息和附加信息两种不同情境进行研究，发现追随者在有限信息情境中按照局部最优策略响应领导者的行动，然而在附加信息情境中，追随者可以通过策略性行动操控领导者的奖励信号，以使得领导者的策略收敛到对自己更有利的均衡状态。基于这些洞察，我们针对这两种情境研究了分布式在线学习，主要贡献是提出了最后迭代收敛和样本复杂度方面的结果。尤其值得注意的是，我们设计了一种新的操控策略，用于处理附加信息情境，并证明它相对于最优应对策略具有内在优势。我们的理论结果也得到了实证结果的支持。

广义和谐史塔克伯格博弈中的去中心化在线学习

Decentralized Online Learning in General-Sum Stackelberg Games

We consider the problem of third-person imitation learning with the
additional challenge that the learner must select the perspective from which
they observe the expert. In our setting, each perspective provides only limited
information about the expert's behavior, and the learning agent must carefully
select and combine information from different perspectives to achieve
competitive performance. This setting is inspired by real-world imitation
learning applications, e.g., in robotics, a robot might observe a human
demonstrator via camera and receive information from different perspectives
depending on the camera's position. We formalize the aforementioned active
third-person imitation learning problem, theoretically analyze its
characteristics, and propose a generative adversarial network-based active
learning approach. Empirically, we demstrate that our proposed approach can
effectively learn from expert demonstrations and explore the importance of
different architectural choices for the learner's performance.

通过选择适当的视角从有限的信息中获取专家行为进行第三方模仿学习，并使用生成对抗网络的主动学习方法来理论分析和实证研究其特性和对学习者性能的重要性。

主动第三人称模仿学习

Active Third-Person Imitation Learning

How can an informed sender persuade a receiver, having only limited
information about the receiver's beliefs? Motivated by research showing
generative AI can simulate economic agents, we initiate the study of
information design with an oracle. We assume the sender can learn more about
the receiver by querying this oracle, e.g., by simulating the receiver's
behavior. Aside from AI motivations such as general-purpose Large Language
Models (LLMs) and problem-specific machine learning models, alternate
motivations include customer surveys and querying a small pool of live users.
Specifically, we study Bayesian Persuasion where the sender has a
second-order prior over the receiver's beliefs. After a fixed number of queries
to an oracle to refine this prior, the sender commits to an information
structure. Upon receiving the message, the receiver takes a payoff-relevant
action maximizing her expected utility given her posterior beliefs. We design
polynomial-time querying algorithms that optimize the sender's expected utility
in this Bayesian Persuasion game. As a technical contribution, we show that
queries form partitions of the space of receiver beliefs that can be used to
quantify the sender's knowledge.

通过使用 oracle 查询来学习有关接收者的信仰，我们研究了信息设计与贝叶斯说服问题，设计了多项式时间查询算法来优化发送者的预期效用。