Stochastic Gradient Descent (SGD) with adaptive steps is now widely used for training deep neural networks. Most theoretical results assume access to unbiased gradient estimators, which is not the case in several recent deep learning and reinforcement learning applications that use Monte Carlo methods. This paper provides a comprehensive non-asymptotic analysis of SGD with biased gradients and adaptive steps for convex and non-convex smooth functions. Our study incorporates time-dependent bias and emphasizes the importance of controlling the bias and Mean Squared Error (MSE) of the gradient estimator. In particular, we establish that Adagrad and RMSProp with biased gradients converge to critical points for smooth non-convex functions at a rate similar to existing results in the literature for the unbiased case. Finally, we provide experimental results using Variational Autoenconders (VAE) that illustrate our convergence results and show how the effect of bias can be reduced by appropriate hyperparameter tuning.

本研究通过非渐进性分析，探讨具有偏倚梯度和自适应步长的随机梯度下降算法，包括时间依赖的偏倚和梯度估计器的均方误差控制，结果表明带偏倚梯度的Adagrad和RMSProp算法收敛速率与无偏情况下的结果相似，实验结果进一步验证了收敛性，并展示了通过适当的超参数调整可以减少偏倚影响的能力。

偏见自适应随机逼近的非渐近分析