agents that assist people need to have well-initialized policies that can
adapt quickly to align with their partners' reward functions. Initializing
policies to maximize performance with unknown partners can be achieved by
bootstrapping nonlinear models using imitation learning over la