Creating Automatic Speech Recognition (ASR) systems that are robust and
resilient to classroom conditions is paramount to the development of AI tools
to aid teachers and students. In this work, we study the efficacy of continued
pretraining (CPT) in adapting Wav2vec2.0 to the classroom domain. We show that
CPT is a powerful tool in that regard and reduces the Word Error Rate (WER) of
Wav2vec2.0-based models by upwards of 10%. More specifically, CPT improves the
model's robustness to different noises, microphones, classroom conditions as
well as classroom demographics. Our CPT models show improved ability to
generalize to different demographics unseen in the labeled finetuning data.

通过持续预训练 (CPT)，我们研究了将 Wav2vec2.0 调整到课堂领域的有效性，结果表明 CPT 是一个强大的工具，可以将基于 Wav2vec2.0 的模型的词错误率（WER）降低 10% 以上，改善了模型对不同噪声、麦克风、课堂条件以及学生群体的适应能力。同时，我们的 CPT 模型在标记微调数据中未见过的不同人群中展示了更好的泛化能力。

基于 Wav2vec2.0 的自动语音识别领域适应中继续预训练在小学数学课堂环境下的应用

Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic  Speech Recognition for Elementary Math Classroom Settings

While general-purpose large language models (LLMs) demonstrate proficiency on
multiple tasks within the domain of translation, approaches based on open LLMs
are competitive only when specializing on a single task. In this paper, we
propose a recipe for tailoring LLMs to multiple tasks present in translation
workflows. We perform continued pretraining on a multilingual mixture of
monolingual and parallel data, creating TowerBase, followed by finetuning on
instructions relevant for translation processes, creating TowerInstruct. Our
final model surpasses open alternatives on several tasks relevant to
translation workflows and is competitive with general-purpose closed LLMs. To
facilitate future research, we release the Tower models, our specialization
dataset, an evaluation framework for LLMs focusing on the translation
ecosystem, and a collection of model generations, including ours, on our
benchmark.

我们提出了一种多任务定制通用大型语言模型的方法，通过在多语言混合单语和平行数据上进行持续预训练，创建了 TowerBase，并在与翻译过程相关的指令上进行微调，创建了 TowerInstruct。我们的最终模型在与翻译工作流相关的多个任务上超过了开放式替代方案，并与通用封闭式语言模型相媲美。为了促进未来研究，我们发布了 Tower 模型、我们的专业数据集、一个关注翻译生态系统的 LLM 评估框架以及一系列模型生成版本的基准。

Tower: 一种面向翻译相关任务的开放式多语言大型语言模型

Tower: An Open Multilingual Large Language Model for Translation-Related  Tasks

Recently introduced language model prompting methods can achieve high
accuracy in zero- and few-shot settings while requiring few to no learned
task-specific parameters. Nevertheless, these methods still often trail behind
full model finetuning. In this work, we investigate if a dedicated continued
pretraining stage could improve "promptability", i.e., zero-shot performance
with natural language prompts or few-shot performance with prompt tuning. We
reveal settings where existing continued pretraining methods lack
promptability. We also identify current methodological gaps, which we fill with
thorough large-scale experiments. We demonstrate that a simple recipe,
continued pretraining that incorporates a trainable prompt during multi-task
learning, leads to improved promptability in both zero- and few-shot settings
compared to existing methods, up to 31% relative. On the other hand, we find
that continued pretraining using MAML-style meta-learning, a method that
directly optimizes few-shot promptability, yields subpar performance. We
validate our findings with two prompt tuning methods, and, based on our
results, we provide concrete recommendations to optimize promptability for
different use cases.

本研究探讨了在语言模型提示方法中继续预训练阶段是否能够提高零 - shot 以及少量样本情况下语言模型的性能，并通过大规模实验表明使用多任务学习的实时递归预训练策略可将零 - shot 及几轮试验下的效果提高至 31% 相对性能，然而使用元学习方法的继续预训练阶段的性能不佳。我们提出了针对不同应用的具体推荐，以优化语言模型的性能。