We focus on multi-domain Neural Machine Translation, with the goal of developing efficient models which can handle data from various domains seen during training and are robust to domains unseen during training. We hypothesize that Sparse Mixture-of-Experts (SMoE) models are a good fit for this task, as they enable efficient model scaling, which helps to accommodate a variety of multi-domain data, and allow flexible sharing of parameters between domains, potentially enabling knowledge transfer between similar domains and limiting negative transfer. We conduct a series of experiments aimed at validating the utility of SMoE for the multi-domain scenario, and find that a straightforward width scaling of Transformer is a simpler and surprisingly more efficient approach in practice, and reaches the same performance level as SMoE. We also search for a better recipe for robustness of multi-domain systems, highlighting the importance of mixing-in a generic domain, i.e. Paracrawl, and introducing a simple technique, domain randomization.

我们关注多领域神经机器翻译，旨在开发能够处理训练期间见过的各种领域数据并对未见过的领域具有鲁棒性的高效模型。我们假设稀疏专家混合（SMoE）模型非常适合这个任务，因为它们能够实现高效的模型扩展，有助于适应各种多领域数据，并允许领域间参数的灵活共享，从而可能实现类似领域之间的知识传递，并限制负面传递。我们进行了一系列实验证实SMoE在多领域场景中的效用，并发现在实践中，对Transformer进行简单的宽度扩展是一种更简洁且出人意料地高效的方法，其达到了与SMoE相同的性能水平。我们还寻求多领域系统的更好鲁棒性方法，强调了混合通用领域（如Paracrawl）的重要性，并引入了一种简单的技术，即领域随机化。

探索稀疏专家混合模型在多领域神经机器翻译中的潜力