Children from bilingual backgrounds benefit from interactions with parents and teachers to re-acquire their heritage language. In this paper, we investigate how this insight from behavioral study can be incorporated into the learning of small-scale language models. We introduce BAMBINO-LM, a continual pretraining strategy for BabyLM that uses a novel combination of alternation and PPO-based perplexity reward induced from a parent Italian model. Upon evaluation on zero-shot classification tasks for English and Italian, BAMBINO-LM improves the Italian language capability of a BabyLM baseline. Our ablation analysis demonstrates that employing both the alternation strategy and PPO-based modeling is key to this effectiveness gain. We also show that, as a side effect, the proposed method leads to similar degradation in L1 effectiveness as human children would have had in an equivalent learning scenario.

在这篇论文中，我们研究了双语背景儿童如何通过与父母和教师的互动重新获得他们的传统语言，并将这一洞见应用于小规模语言模型的学习中。我们介绍了BAMBINO-LM，这是一种连续预训练策略，结合了来自父母意大利语模型的交替和基于PPO的困惑奖励。在英语和意大利语的零样本分类任务上评估后，BAMBINO-LM提高了BabyLM基线模型的意大利语能力。我们的消融分析表明，采用交替策略和基于PPO的建模是实现这一效果增益的关键。我们还表明，作为副作用，所提出的方法会导致与人类儿童在等效学习场景中可能遇到的L1效果下降类似的效果。

BAMBINO-LM：（双语）人类灵感的BabyLM连续预训练