Executing computer vision models on streaming visual data, or streaming
perception is an emerging problem, with applications in self-driving, embodied
agents, and augmented/virtual reality. The development of such systems is
largely governed by the accuracy and latency of the processin