Jul, 2018

为程序综合和语义解析优化的记忆增强策略

TL;DRMemory Augmented Policy Optimization (MAPO) improves policy gradient's sample efficiency and robustness on tasks with sparse rewards. When applied to weakly supervised program synthesis from natural language, it achieves state-of-the-art accuracy with only weak supervision.