State-of-the-art reinforcement learning algorithms mostly rely on being
allowed to directly interact with their environment to collect millions of
observations. This makes it hard to transfer their success to industrial
control problems, where simulations are often very costly or do no