This paper investigates the problem of interactively learning behaviors
communicated by a human teacher using positive and negative feedback. Much
previous work on this problem has made the assumption that people provide
feedback for decisions that is dependent on the behavior they are teaching and
is independent from the learner's current policy. We present