The interaction between an artificial agent and its environment is
bi-directional. The agent extracts relevant information from the environment,
and affects the environment by its actions in return to accumulate high
expected reward. Standard reinforcement learning (RL) deals with the expected
reward maximization. However, there are always information-theoretic
limitations that restrict the expected reward, which are not properly
considered by the standard RL. In this work we consider RL objectives with
information-theoretic limitations. For the first time we derive a Bellman-type
recursive equa- tion for the causal information between the environment and the
agent, which is combined plausibly with the Bellman recursion for the value
function. The unified equitation serves to explore the typical behavior of
artificial agents in an infinite time horizon.

研究人工智能代理和其环境的交互，探讨了在信息理论限制下如何通过强化学习算法使代理能够在无限时间范围内获得最大化的预期回报。首次提出了环境和代理之间因果信息的贝尔曼递归方程，与值函数的贝尔曼递归方程结合使用。

马尔可夫决策过程中因果信息和价值的统一贝尔曼方程

A Unified Bellman Equation for Causal Information and Value in Markov  Decision Processes

The problem of graphical model selection is to correctly estimate the graph
structure of a Markov random field given samples from the underlying
distribution. We analyze the information-theoretic limitations of the problem
of graph selection for binary Markov random fields under high-dimensional
scaling, in which the graph size $p$ and the number of edges $k$, and/or the
maximal node degree $d$ are allowed to increase to infinity as a function of
the sample size $n$. For pairwise binary Markov random fields, we derive both
necessary and sufficient conditions for correct graph selection over the class
$\mathcal{G}_{p,k}$ of graphs on $p$ vertices with at most $k$ edges, and over
the class $\mathcal{G}_{p,d}$ of graphs on $p$ vertices with maximum degree at
most $d$. For the class $\mathcal{G}_{p, k}$, we establish the existence of
constants $c$ and $c'$ such that if $\numobs < c k \log p$, any method has
error probability at least 1/2 uniformly over the family, and we demonstrate a
graph decoder that succeeds with high probability uniformly over the family for
sample sizes $\numobs > c' k^2 \log p$. Similarly, for the class
$\mathcal{G}_{p,d}$, we exhibit constants $c$ and $c'$ such that for $n < c d^2
\log p$, any method fails with probability at least 1/2, and we demonstrate a
graph decoder that succeeds with high probability for $n > c' d^3 \log p$.

研究二元马尔可夫随机场中，图形选择问题在高维情况下的信息论局限性，为具有最多 k 条边的 p 个定点图的类 $Gpk$ 以及最高 degree 不超过 d 的 p 个定点图的类 $Gpd$，提出了正确图形选择的必要和充分条件，并建立了一个图形译码器，该译码器适用于样本量 n>c'k² log (p) 和 n>c'd³ log (p)。