offline rl methods have been shown to reduce the need for environment
interaction by training agents using offline collected episodes. However, these
methods typically require action information to be logged during data
collection, which can be difficult or even impossible in some prac