This paper investigates the scaling properties of Recurrent Neural Network Language Models (RNNLMs). We discuss how to train very large RNNs on GPUs and address the questions of how RNNLMs scale with respect to model size, training-set size, computational costs and memory. Our analysis shows that despite being more costly to train, RNNLMs obtain much lower perplexities on standard benchmarks than n-gram models. We train the largest known RNNs and present relative word error rates gains of 18% on an ASR task. We also present the new lowest perplexities on the recently released billion word language modelling benchmark, 1 BLEU point gain on machine translation and a 17% relative hit rate gain in word prediction.

本论文调查了循环神经网络语言模型（RNNLMs）的规模特性，对在GPU上训练非常大的RNN的方法进行了讨论，并探讨了有关模型大小、训练集大小、计算成本和内存方面的扩展性问题。研究结果展现出虽然训练成本更高，但相比于N元模型，RNNLMs在标准测试中得到了更低的困惑度。此外，我们训练了目前已知的最大RNN，并在ASR任务上表现出18%的相对词误差率提升，在最近发布的十亿字语言建模基准测试中呈现出新的最低困惑度、机器翻译的1 BLEU点表现提升以及词预测方面的17%相对命中率提高。

循环神经网络语言模型的扩展