offline reinforcement learning algorithms promise to be applicable in
settings where a fixed dataset is available and no new experience can be
acquired. However, such formulation is inevitably offline-data-hungry and, in
practice, collecting a large offline dataset for one specific tas