We provide faster randomized algorithms for computing an $\epsilon$-optimal
policy in a discounted Markov decision process with
$A_{\text{tot}}$-state-action pairs, bounded rewards, and discount factor
$\gamma$. We provide an $\tilde{O}(A_{\text{tot}}[(1 -
\gamma)^{-3}\epsilon^{-2} + (1 - \gamma)^{-2}])$-time algorithm in the sampling
setting, where the probability transition matrix is unknown but accessible
through a generative model which can be queried in $\tilde{O}(1)$-time, and an
$\tilde{O}(s + (1-\gamma)^{-2})$-time algorithm in the offline setting where
the probability transition matrix is known and $s$-sparse. These results
improve upon the prior state-of-the-art which either ran in
$\tilde{O}(A_{\text{tot}}[(1 - \gamma)^{-3}\epsilon^{-2} + (1 - \gamma)^{-3}])$
time [Sidford, Wang, Wu, Ye 2018] in the sampling setting, $\tilde{O}(s +
A_{\text{tot}} (1-\gamma)^{-3})$ time [Sidford, Wang, Wu, Yang, Ye 2018] in the
offline setting, or time at least quadratic in the number of states using
interior point methods for linear programming. We achieve our results by
building upon prior stochastic variance-reduced value iteration methods
[Sidford, Wang, Wu, Yang, Ye 2018]. We provide a variant that carefully
truncates the progress of its iterates to improve the variance of new
variance-reduced sampling procedures that we introduce to implement the steps.
Our method is essentially model-free and can be implemented in
$\tilde{O}(A_{\text{tot}})$-space when given generative model access.
Consequently, our results take a step in closing the sample-complexity gap
between model-free and model-based methods.

我们提供了一种更快的随机算法，用于在具有有限状态动作对、有界奖励和折扣因子的折扣马尔可夫决策过程中计算 ε- 最优策略。我们通过在采样设置和离线设置中提供不同的时间算法，进一步优化了之前的最先进技术。我们的方法基于先前的随机方差减少值迭代方法，通过引入新的方差减少采样过程并优化其迭代进展，能够在没有模型的情况下实现，并在模型自由和基于模型方法之间填补了样本复杂性差距。

截断方差减小的值迭代

Truncated Variance Reduced Value Iteration

Influence maximization is the task of selecting a small number of seed nodes
in a social network to maximize the influence spread from these seeds. It has
been widely investigated in the past two decades. In the canonical setting, the
social network and its diffusion parameters are given as input. In this paper,
we consider the more realistic sampling setting where the network is unknown
and we only have a set of passively observed cascades that record the sets of
activated nodes at each diffusion step. We study the task of influence
maximization from these cascade samples (IMS) and present constant
approximation algorithms for it under mild conditions on the seed set
distribution. To achieve the optimization goal, we also provide a novel
solution to the network inference problem, that is, learning diffusion
parameters and the network structure from the cascade data. Compared with prior
solutions, our network inference algorithms require weaker assumptions and do
not rely on maximum-likelihood estimation and convex programming. Our IMS
algorithms enhance the learning-and-then-optimization approach by allowing a
constant approximation ratio even when the diffusion parameters are hard to
learn, and we do not need any assumption related to the network structure or
diffusion parameters.

本文提出一种基于采样的影响力最大化方法，对于给定的节点影响传播日志数据，采用新颖的网络推断方法，可以学习到网络结构和传播参数，避免网络结构和参数假设所带来的误差。相比以往的方法，本方法不需要最大似然估计和凸规划假设，可以在网络参数学习难度较高的情况下保证一个比较小的近似率。