A general model of decentralized stochastic control called partial history
sharing information structure is presented. In this model, at each step the
controllers share part of their observation and control history with each
other. This general model subsumes several existing models of information
sharing as special cases. Based on the information commonly known to all the
controllers, the decentralized problem is reformulated as an equivalent
centralized problem from the perspective of a coordinator. The coordinator
knows the common information and select prescriptions that map each
controller's local information to its control actions. The optimal control
problem at the coordinator is shown to be a partially observable Markov
decision process (POMDP) which is solved using techniques from Markov decision
theory. This approach provides (a) structural results for optimal strategies,
and (b) a dynamic program for obtaining optimal strategies for all controllers
in the original decentralized problem. Thus, this approach unifies the various
ad-hoc approaches taken in the literature. In addition, the structural results
on optimal control strategies obtained by the proposed approach cannot be
obtained by the existing generic approach (the person-by-person approach) for
obtaining structural results in decentralized problems; and the dynamic program
obtained by the proposed approach is simpler than that obtained by the existing
generic approach (the designer's approach) for obtaining dynamic programs in
decentralized problems.

本研究提出了一种分散式随机控制的普适模型，称之为部分历史共享信息结构。在该模型中，每一时刻控制器都要分享他们的部分观察历史和控制历史。基于所有控制器共同知晓的信息，将分散式问题从一个协调者的角度重构为等价的集中式问题，并提出了一种解决这个等价问题的方法。相较于已有的方法，这种方法是更简单、综合的，能够提供更好的结构性和动态规划方案。

部分历史分享下的分散随机控制：一种共同信息方法

Decentralized Stochastic Control with Partial History Sharing: A Common  Information Approach

The importance of a node in a directed graph can be measured by its PageRank.
The PageRank of a node is used in a number of application contexts - including
ranking websites - and can be interpreted as the average portion of time spent
at the node by an infinite random walk. We consider the problem of maximizing
the PageRank of a node by selecting some of the edges from a set of edges that
are under our control. By applying results from Markov decision theory, we show
that an optimal solution to this problem can be found in polynomial time. Our
core solution results in a linear programming formulation, but we also provide
an alternative greedy algorithm, a variant of policy iteration, which runs in
polynomial time, as well. Finally, we show that, under the slight modification
for which we are given mutually exclusive pairs of edges, the problem of
PageRank optimization becomes NP-hard.

本文介绍了如何通过控制节点之间的边来优化 PageRank 的方法，其核心方法是基于线性规划和贪心算法，并且在给定互斥边的情况下，此问题是 NP 困难问题。

通过边选择优化 PageRank

PageRank Optimization by Edge Selection

This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.

本论文从计算机科学的角度调查了强化学习领域，包括历史、现状和实践应用等方面，并重点探讨了强化学习中的中心问题，如平衡探索和利用、马尔可夫决策理论、延迟强化学习等。