Clinical notes in electronic health records contain highly heterogeneous writing styles, including non-standard terminology or abbreviations. Using these notes in predictive modeling has traditionally required preprocessing (e.g. taking frequent terms or topic modeling) that removes much of the richness of the source data. We propose a pretrained hierarchical recurrent neural network model that parses minimally processed clinical notes in an intuitive fashion, and show that it improves performance for multiple classification tasks on the Medical Information Mart for Intensive Care III (MIMIC-III) dataset, increasing top-5 recall to 89.7% (up by 4.8%) for primary diagnosis classification and AUPRC to 35.2% (up by 2.4%) for multilabel diagnosis classification compared to models that treat the notes as an unordered collection of terms or without pretraining. We also apply an attribution technique to several examples to identify the words and the nearby context that the model uses to make its prediction, and show the importance of the words' context.

提出了一种预先训练的层级循环神经网络模型，通过解析最小处理的临床记录，比传统方法更好地处理了医疗信息技术中的出院诊断分类任务，并应用归因技术来确定模型用于进行预测的单词以及其重要性。

利用语言模型预训练改进临床病历的层级病人分类