To create useful reinforcement learning (RL) agents, step zero is to design a
suitable reward function that captures the nuances of the task. However, reward
engineering can be a difficult and time-consuming process. Instead,
human-in-the-loop (HitL) RL allows agents to learn reward functions from human
feedback. Despite recent successes, many of the HitL RL