Rapid progress has been witnessed for human-object interaction (HOI)
recognition, but most existing models are confined to single-stage reasoning
pipelines. Considering the intrinsic complexity of the task, we introduce a
cascade architecture for a multi-stage, coarse-to-fine HOI under