The current standard approach for fine-tuning transformer-based language
models includes a fixed number of training epochs and a linear learning rate
schedule. In order to obtain a near-optimal model for the given downstream
task, a search in optimization hyperparameter space is usuall