Current language models have a significant limitation in the ability to
encode and decode factual knowledge. This is mainly because they acquire such
knowledge from statistical co-occurrences although most of the knowledge words
are rarely observed. In this paper, we propose a Neural K