We present a novel $l_1$ regularized off-policy convergent td-learning method
(termed RO-TD), which is able to learn sparse representations of value
functions with low computational complexity. The algorithmic framework
underlying RO-TD integrates two key ideas: off-policy convergent g