We investigate the effective memory depth of RNN models by using them for n-gram language model (LM) smoothing. Experiments on a small corpus (UPenn Treebank, one million words of training data and 10k vocabulary) have found the LSTM cell with dropout to be the best model for encoding the n-gram state when compared with feed-forward and vanilla RNN models. When preserving the sentence independence assumption the LSTM n-gram matches the LSTM LM performance for n=9 and slightly outperforms it for n=13. When allowing dependencies across sentence boundaries, the LSTM 13-gram almost matches the perplexity of the unlimited history LSTM LM. LSTM n-gram smoothing also has the desirable property of improving with increasing n-gram order, unlike the Katz or Kneser-Ney back-off estimators. Using multinomial distributions as targets in training instead of the usual one-hot target is only slightly beneficial for low n-gram orders. Experiments on the One Billion Words benchmark show that the results hold at larger scale. Building LSTM n-gram LMs may be appealing for some practical situations: the state in a n-gram LM can be succinctly represented with (n-1)*4 bytes storing the identity of the words in the context and batches of n-gram contexts can be processed in parallel. On the downside, the n-gram context encoding computed by the LSTM is discarded, making the model more expensive than a regular recurrent LSTM LM.

通过使用RNN模型进行$n$-gram语言模型平滑来研究其有效的记忆深度，实验结果表明，在保持句子独立性假设的前提下，使用dropout技术的LSTM cell在编码$n$-gram状态方面的表现最佳，且在$n=9$时，LSTM $n$-gram与LSTM LM表现相当，同时在$n=13$时略优于其，该方法可以提高模型的性能，特别适用于模拟短格式文本如语音搜索/查询语言模型。

使用循环神经网络估计N元语言模型