Over the past decade, deep neural networks have demonstrated significant success using the training scheme that involves mini-batch stochastic gradient descent on extensive datasets. Expanding upon this accomplishment, there has been a surge in research exploring the application of neu