In a variety of problems originating in supervised, unsupervised, and reinforcement learning, the loss function is defined by an expectation over a collection of random variables, which might be part of a probabilistic model or the external world. Estimating the gradient of this loss function, using samples, lies at the core of gradient-based learning algorithms for these problems. We introduce the formalism of stochastic computation graphs---directed acyclic graphs that include both deterministic functions and conditional probability distributions---and describe how to easily and automatically derive an unbiased estimator of the loss function's gradient. The resulting algorithm for computing the gradient estimator is a simple modification of the standard backpropagation algorithm. The generic scheme we propose unifies estimators derived in variety of prior work, along with variance-reduction techniques therein. It could assist researchers in developing intricate models involving a combination of stochastic and deterministic operations, enabling, for example, attention, memory, and control actions.

通过引入随机计算图的形式化方法，该论文描述了如何自动推导损失函数梯度的无偏估计量，提出了一种计算梯度估计器的算法，从而统一了以前工作中推导的估算器和其中的方差减少技术，该算法使得研究人员可以开发涉及随机和确定性操作相结合的复杂模型，包括注意力、记忆和控制动作。

使用随机计算图估算梯度