BriefGPT.xyz
Oct, 2022
Lila: 数学推理的统一基准
Lila: A Unified Benchmark for Mathematical Reasoning
HTML
PDF
Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck...
TL;DR
通过构建LILA基准测试,我们提出了一种旨在评估和改进人工智能系统在数学推理领域中表现的方法,并发现多任务学习可以显著提高性能,在一般数学推理和理解方面还有提高的余地。
Abstract
mathematical reasoning
skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling. Towards evaluating and improving
ai systems
in this domain, we propo
→