We focus on the task of procedural text understanding, which aims to track entities' states and locations during a natural process. Although recent approaches have achieved substantial progress, they are far behind human performance. Two challenges, difficulty of commonsense reasoning