We characterise some of the quirks and shortcomings in the exploration of visual dialogue (VD) - a sequential question-answering task where the questions and corresponding answers are related through given visual stimuli. To do so, we develop an embarrassingly simple method based on