differentiable simulators promise faster computation time for reinforcement
learning by replacing zeroth-order gradient estimates of a stochastic objective
with an estimate based on first-order gradients. However, it is yet unclear
what factors decide the performance of the two estimat