Robots often rely on a repertoire of previously-learned motion policies for
performing tasks of diverse complexities. When facing unseen task conditions or
when new task requirements arise, robots must adapt their motion policies
accordingly. In this context, policy optimization is the
Variational inference (变分推断) can be optimized using Wasserstein gradient descent methods to improve efficiency and alignment of variational parameters with the true posterior.