Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains. It contains convolutional layers, an attention mechanism, and sparsely-gated layers. Each of these computational blocks is crucial for a subset of the tasks we train on. Interestingly, even if a block is not crucial for a task, we observe that adding it never hurts performance and in most cases improves it on all tasks. We also show that tasks with less data benefit largely from joint training with other tasks, while performance on large tasks degrades only slightly if at all.

本研究旨在通过建立一个深度学习模型，同时训练图像分类、多语种翻译、图片描述生成、语音识别以及英文解析等多个任务，并在各个任务上都达到良好的结果。该模型特别之处在于将多个计算块集成到架构中，如卷积层、注意力机制和稀疏门控层等，每个计算块都对一部分任务至关重要，但即使对于不是关键任务的块，也不会影响其它任务的表现。同时，该研究还发现，少样本的任务集成到多个任务进行训练将会有较大提升。

学习所有模型的一种模型