BriefGPT.xyz
Feb, 2021
在连续任务中利用导师进行领域知识转移
Transferring Domain Knowledge with an Adviser in Continuous Tasks
HTML
PDF
Rukshan Wijesinghe, Kasun Vithanage, Dumindu Tissera, Alex Xavier, Subha Fernando...
TL;DR
将顾问整合到深度确定性策略梯度(DDPG)算法,以允许将领域知识以预先学习的政策或预定义的关系的形式整合到学习过程中,以加速学习和改善政策。
Abstract
Recent advances in
reinforcement learning
(RL) have surpassed human-level performance in many simulated environments. However, existing
reinforcement learning
techniques are incapable of explicitly incorporating
→