One of the central skills that language learners need to practice is speaking
the language. Currently, students in school do not get enough speaking
opportunities and lack conversational practice. Recent advances in speech
technology and natural language processing allow for the creation of novel
tools to practice their speaking skills. In this work, we tackle the first
component of such a pipeline, namely, the automated speech recognition module
(ASR), which faces a number of challenges: first, state-of-the-art ASR models
are often trained on adult read-aloud data by native speakers and do not
transfer well to young language learners' speech. Second, most ASR systems
contain a powerful language model, which smooths out errors made by the
speakers. To give corrective feedback, which is a crucial part of language
learning, the ASR systems in our setting need to preserve the errors made by
the language learners. In this work, we build an ASR system that satisfies
these requirements: it works on spontaneous speech by young language learners
and preserves their errors. For this, we collected a corpus containing around
85 hours of English audio spoken by learners in Switzerland from grades 4 to 6
on different language learning tasks, which we used to train an ASR model. Our
experiments show that our model benefits from direct fine-tuning on children's
voices and has a much higher error preservation rate than other models.

在这项工作中，我们构建了一个满足条件的自动语音识别系统，用于年轻语言学习者的自由说话并保留他们的错误。

年轻英语学习者语音的错误保留自动语音识别

Error-preserving Automatic Speech Recognition of Young English Learners'  Language

Humans are efficient language learners and inherently social creatures. Our
language development is largely shaped by our social interactions, for example,
the demonstration and feedback from caregivers. Contrary to human language
learning, recent advancements in large language models have primarily adopted a
non-interactive training paradigm, and refined pre-trained models through
feedback afterward. In this work, we aim to examine how corrective feedback
from interactions influences neural language acquisition from the ground up
through systematically controlled experiments, assessing whether it contributes
to learning efficiency in language models. We introduce a
trial-and-demonstration (TnD) learning framework that incorporates three
components: student trials, teacher demonstrations, and a reward conditioned on
language competence at various developmental stages. Our experiments reveal
that the TnD approach accelerates word acquisition for student models of equal
and smaller numbers of parameters, and we highlight the significance of both
trials and demonstrations. We further show that the teacher's choices of words
influence students' word-specific learning efficiency, and a
practice-makes-perfect effect is evident by a strong correlation between the
frequency of words in trials and their respective learning curves. Our findings
suggest that interactive language learning, with teacher demonstrations and
student trials, can facilitate efficient word learning in language models.

通过系统性可控实验，我们研究了互动交互对神经语言学习的影响，发现通过教师示范和学生尝试，互动式语言学习有助于语言模型的词汇学习效率提高。

从零开始照看语言模型：通过试验和演示进行交互式语言学习

Babysit A Language Model From Scratch: Interactive Language Learning by  Trials and Demonstrations

There is a growing interest in applying pre-trained large language models
(LLMs) to planning problems. However, methods that use LLMs directly as
planners are currently impractical due to several factors, including limited
correctness of plans, strong reliance on feedback from interactions with
simulators or even the actual environment, and the inefficiency in utilizing
human feedback. In this work, we introduce a novel alternative paradigm that
constructs an explicit world (domain) model in planning domain definition
language (PDDL) and then uses it to plan with sound domain-independent
planners. To address the fact that LLMs may not generate a fully functional
PDDL model initially, we employ LLMs as an interface between PDDL and sources
of corrective feedback, such as PDDL validators and humans. For users who lack
a background in PDDL, we show that LLMs can translate PDDL into natural
language and effectively encode corrective feedback back to the underlying
domain model. Our framework not only enjoys the correctness guarantee offered
by the external planners but also reduces human involvement by allowing users
to correct domain models at the beginning, rather than inspecting and
correcting (through interactive prompting) every generated plan as in previous
work. On two IPC domains and a Household domain that is more complicated than
commonly used benchmarks such as ALFWorld, we demonstrate that GPT-4 can be
leveraged to produce high-quality PDDL models for over 40 actions, and the
corrected PDDL models are then used to successfully solve 48 challenging
planning tasks. Resources including the source code will be released at:
this https URL

本研究引入了一种新的方法，使用 PDDL 语言构建显式世界模型，并利用预训练的大型语言模型作为 PDDL 和校验器等纠正反馈的接口，以提高计划问题的效率和准确性。在不涉及用户互动的情况下，通过验证 PDDL 模型的正确性，我们制定计划来解决复杂任务并取得成功。

利用预训练的大型语言模型构建和利用世界模型进行基于模型的任务规划

Leveraging Pre-trained Large Language Models to Construct and Utilize  World Models for Model-based Task Planning

Deep reinforcement learning can learn effective policies for a wide range of
tasks, but is notoriously difficult to use due to instability and sensitivity
to hyperparameters. The reasons for this remain unclear. When using standard
supervised methods (e.g., for bandits), on-policy data collection provides
"hard negatives" that correct the model in precisely those states and actions
that the policy is likely to visit. We call this phenomenon "corrective
feedback." We show that bootstrapping-based Q-learning algorithms do not
necessarily benefit from this corrective feedback, and training on the
experience collected by the algorithm is not sufficient to correct errors in
the Q-function. In fact, Q-learning and related methods can exhibit
pathological interactions between the distribution of experience collected by
the agent and the policy induced by training on that experience, leading to
potential instability, sub-optimal convergence, and poor results when learning
from noisy, sparse or delayed rewards. We demonstrate the existence of this
problem, both theoretically and empirically. We then show that a specific
correction to the data distribution can mitigate this issue. Based on these
observations, we propose a new algorithm, DisCor, which computes an
approximation to this optimal distribution and uses it to re-weight the
transitions used for training, resulting in substantial improvements in a range
of challenging RL settings, such as multi-task learning and learning from noisy
reward signals. Blog post presenting a summary of this work is available at:
this https URL

探讨了深度强化学习中 Q-learning 等方法实现不稳定、难以调参的问题，提出了一种基于数据分布优化的新算法，DisCor，通过纠正数据分布来改善学习效果。