While deep generative models have succeeded in image processing, natural
language processing, and reinforcement learning, training that involves
discrete random variables remains challenging due to the high variance of its
gradient estimation process. Monte Carlo is a common solution used in most
variance reduction approaches. However, this involves time-consuming resampling
and multiple function evaluations. We propose a Gapped Straight-Through (GST)
estimator to reduce the variance without incurring resampling overhead. This
estimator is inspired by the essential properties of Straight-Through
Gumbel-Softmax. We determine these properties and show via an ablation study
that they are essential. Experiments demonstrate that the proposed GST
estimator enjoys better performance compared to strong baselines on two
discrete deep generative modeling tasks, MNIST-VAE and ListOps.

提出一种灵活的 Gapped Straight-Through (GST) 估计器来降低离散随机变量的梯度估计中的高方差，在 MNIST-VAE 和 ListOps 的两项离散深度生成建模任务中表现优异，比其他策略具有更好的性能。

使用间隔的直通估计器训练离散深度生成模型

Training Discrete Deep Generative Models via Gapped Straight-Through Estimator

We derive an unbiased estimator for expectations over discrete random
variables based on sampling without replacement, which reduces variance as it
avoids duplicate samples. We show that our estimator can be derived as the
Rao-Blackwellization of three different estimators. Combining our estimator
with REINFORCE, we obtain a policy gradient estimator and we reduce its
variance using a built-in control variate which is obtained without additional
model evaluations. The resulting estimator is closely related to other gradient
estimators. Experiments with a toy problem, a categorical Variational
Auto-Encoder and a structured prediction problem show that our estimator is the
only estimator that is consistently among the best estimators in both high and
low entropy settings.

本文提出了一种基于无重复抽样的离散随机变量期望无偏估计方法，将其与 REINFORCE 算法相结合，得到了具有内置控制变量的策略梯度估计器，并应用于多种任务得到了良好的效果。

通过无放回抽样估计离散随机变量的梯度

Estimating Gradients for Discrete Random Variables by Sampling without  Replacement

Within many machine learning algorithms, a fundamental problem concerns
efficient calculation of an unbiased gradient wrt parameters $\gammav$ for
expectation-based objectives $\Ebb_{q_{\gammav} (\yv)} [f(\yv)]$. Most existing
methods either (i) suffer from high variance, seeking help from (often)
complicated variance-reduction techniques; or (ii) they only apply to
reparameterizable continuous random variables and employ a reparameterization
trick. To address these limitations, we propose a General and One-sample (GO)
gradient that (i) applies to many distributions associated with
non-reparameterizable continuous or discrete random variables, and (ii) has the
same low-variance as the reparameterization trick. We find that the GO gradient
often works well in practice based on only one Monte Carlo sample (although one
can of course use more samples if desired). Alongside the GO gradient, we
develop a means of propagating the chain rule through distributions, yielding
statistical back-propagation, coupling neural networks to common random
variables.

提出了一种可以应用于非可重参数化连续或离散随机变量的梯度计算方法（GO 梯度），并且具有与可重参数化方法相同的低方差，同时还开发了一种通过不同分布的传播链规则、将神经网络与常见随机变量相耦合的统计反向传播方法。

基于期望目标的 GO 梯度

GO Gradient for Expectation-Based Objectives

Stochastic control-flow models (SCFMs) are a class of generative models that
involve branching on choices from discrete random variables. Amortized
gradient-based learning of SCFMs is challenging as most approaches targeting
discrete variables rely on their continuous relaxations---which can be
intractable in SCFMs, as branching on relaxations requires evaluating all
(exponentially many) branching paths. Tractable alternatives mainly combine
REINFORCE with complex control-variate schemes to improve the variance of naive
estimators. Here, we revisit the reweighted wake-sleep (RWS) (Bornschein and
Bengio, 2015) algorithm, and through extensive evaluations, show that it
outperforms current state-of-the-art methods in learning SCFMs. Further, in
contrast to the importance weighted autoencoder, we observe that RWS learns
better models and inference networks with increasing numbers of particles. Our
results suggest that RWS is a competitive, often preferable, alternative for
learning SCFMs.

本文研究用于生成模型的随机控制流模型的学习问题，提出了一种基于重加权的 wake-sleep 算法，证明在学习 SCFMs 方面胜过其他现有的方法，是竞争力和优选的选择。